|
ABSTRACT
In this paper, we present two ways to improve the precision of HITS-based algorithms on Web documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance-scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Computer Networks and ISDN Systems, v.30 n.1-7, p.65-74, April 1, 1998
|
| |
3
|
|
| |
4
|
|
| |
5
|
S. Chien, A. Stechert, and D. Mutz. Efficient heuristic hypothesis ranking. Journal of Artificial Intelligence Research, pages 375--397, 10 (1999).
|
| |
6
|
C. L. A. Clark, G. V. Cormack, and F. J. Burkowski. Shortest substring ranking. In Fourth Text Retrieval Conference (TREC-4), pages 295--304, Gaithersburg, MD, 1995.
|
| |
7
|
|
| |
8
|
S. J. Clarke and P. Willett. Estimating the recall performance of web search engines. Aslib Proceedings, pages 184--189, July/August 1997.
|
| |
9
|
W. Ding and G. Marchionini. A comparative study of web search service performance. In ASIS'96: Proc. 59th ASIS Annual Meeting, pages 136--141, Medford, NJ: Information Today, Inc., 1996.
|
| |
10
|
A. Farahat, T. LoFaro, and J. C. Miller. Modification of kleinberg's hits algorithm using matrix exponentiation and web log records. In Proceedings of the 24th International Conference on Research and Development in Information Retrieval (SIGIR 2001), New Orleans, USA, September 2001.
|
 |
11
|
|
| |
12
|
|
| |
13
|
D. Hawking, P. Bailey, and N. Craswell. Acsys trec-8 experiments. In Proceedings of the TREC-8, 1999.
|
| |
14
|
D. Hawking, N. Craswell, and P. Thistlewaste. Overview of the trec-7 very large collection track. In Proceedings of the TREC-7, 1998.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
S. L. MacCall and A. D. Cleveland. A relevance-based quantitative measure for internet information retrieval evaluation. In Proceedings of the American Society for Information Science 1999 Annual Meeting, pages 763--768, 1999.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
D. Sullivan. www.searchenginewatch.com/reports/sizes.html, Search Engine Sizes, August 15, 2001.
|
| |
25
|
R. Wilkinson, J. J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In Fourth Text Retrieval Conference (TREC-4), pages 277--285, Gaithersburg, MD, 1995.
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
André Luiz da Costa Carvalho , Paul - Alexandru Chirita , Edleno Silva de Moura , Pável Calado , Wolfgang Nejdl, Site level noise removal for search engines, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|