|
ABSTRACT
This paper describes a large-scale evaluation of the
effectiveness of HITS in comparison with other link-based ranking
algorithms, when used in combination with a state-of-the-art text
retrieval algorithm exploiting anchor text. We quantified their
effectiveness using three common performance measures: the mean
reciprocal rank, the mean average precision, and the normalized
discounted cumulative gain measurements. The evaluation is based on
two large data sets: a breadth-first search crawl of 463 million
web pages containing 17.6 billion hyperlinks and referencing 2.9
billion distinct URLs; and a set of 28,043 queries sampled from a
query log, each query having on average 2,383 results, about 17 of
which were labeled by judges. We found that HITS outperforms
PageRank, but is about as effective as web-page in-degree. The same
holds true when any of the link-based features are combined with
the text retrieval algorithm. Finally, we studied the relationship
between query specificity and the effectiveness of selected
features, and found that link-based features perform better for
general queries, whereas BM25F performs better for specific
queries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, Proceedings of the 10th international conference on World Wide Web, p.415-429, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372096]
|
 |
4
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Link analysis ranking: algorithms, theory, and experiments, ACM Transactions on Internet Technology (TOIT), v.5 n.1, p.231-297, February 2005
[doi> 10.1145/1052934.1052942]
|
| |
5
|
|
 |
6
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
7
|
|
 |
8
|
|
| |
9
|
E. Garfield. Citation analysis as a tool in journal evaluation. Science 178(4060):471--479, 1972.
|
| |
10
|
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In 1st International Workshop on Adversarial Information Retrieval on the Web 2005.
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
M.M. Kessler. Bibliographic coupling between scientific papers. American Documentation 14(1):10--25, 1963.
|
| |
16
|
|
 |
17
|
|
| |
18
|
A.N. Langville and C.D. Meyer. Deeper inside PageRank. Internet Mathematics 1(3):2005, 335--380.
|
| |
19
|
|
 |
20
|
|
| |
21
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
|
 |
22
|
|
| |
23
|
T. Upstill, N. Craswell, and D. Hawking. Predicting fame and fortune: Pagerank or indegree? In Proc. of the Australasian Document Computing Symposium pages 31--40, 2003.
|
| |
24
|
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference 2004.
|
|