ACM Home Page
Please provide us with feedback. Feedback
Improvement of HITS-based algorithms on web documents
Full text PdfPdf (214 KB)
Source International World Wide Web Conference archive
Proceedings of the 11th international conference on World Wide Web table of contents
Honolulu, Hawaii, USA
SESSION: Link Analysis table of contents
Pages: 527 - 535  
Year of Publication: 2002
ISBN:1-58113-449-5
Authors
Longzhuang Li  University of Missouri-Columbia, Columbia, MO
Yi Shang  University of Missouri-Columbia, Columbia, MO
Wei Zhang  University of Missouri-Columbia, Columbia, MO
Sponsors
ACM: Association for Computing Machinery
: WWW'02
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 103,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511446.511514
What is a DOI?

ABSTRACT

In this paper, we present two ways to improve the precision of HITS-based algorithms on Web documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance-scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
S. Chien, A. Stechert, and D. Mutz. Efficient heuristic hypothesis ranking. Journal of Artificial Intelligence Research, pages 375--397, 10 (1999).
 
6
C. L. A. Clark, G. V. Cormack, and F. J. Burkowski. Shortest substring ranking. In Fourth Text Retrieval Conference (TREC-4), pages 295--304, Gaithersburg, MD, 1995.
 
7
 
8
S. J. Clarke and P. Willett. Estimating the recall performance of web search engines. Aslib Proceedings, pages 184--189, July/August 1997.
 
9
W. Ding and G. Marchionini. A comparative study of web search service performance. In ASIS'96: Proc. 59th ASIS Annual Meeting, pages 136--141, Medford, NJ: Information Today, Inc., 1996.
 
10
A. Farahat, T. LoFaro, and J. C. Miller. Modification of kleinberg's hits algorithm using matrix exponentiation and web log records. In Proceedings of the 24th International Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, USA, September 2001.
11
 
12
 
13
D. Hawking, P. Bailey, and N. Craswell. Acsys trec-8 experiments. In Proceedings of the TREC-8, 1999.
 
14
D. Hawking, N. Craswell, and P. Thistlewaste. Overview of the trec-7 very large collection track. In Proceedings of the TREC-7, 1998.
15
 
16
 
17
 
18
 
19
 
20
S. L. MacCall and A. D. Cleveland. A relevance-based quantitative measure for internet information retrieval evaluation. In Proceedings of the American Society for Information Science 1999 Annual Meeting, pages 763--768, 1999.
 
21
 
22
 
23
 
24
D. Sullivan. www.searchenginewatch.com/reports/sizes.html, Search Engine Sizes, August 15, 2001.
 
25
R. Wilkinson, J. J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In Fourth Text Retrieval Conference (TREC-4), pages 277--285, Gaithersburg, MD, 1995.

CITED BY  15

Collaborative Colleagues:
Longzhuang Li: colleagues
Yi Shang: colleagues
Wei Zhang: colleagues