|
ABSTRACT
The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval. In order to provide high quality search results, it is important to detect them and reduce their influence. In this paper, a method is proposed to detect such links by considering multiple similarity measures over the source pages and target pages. With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph. The usefulness of a number of features are also tested. Experiments across 53 query-specific datasets show our approach almost doubles the performance of Kleinberg's HITS and boosts Bharat and Henzinger's imp algorithm by close to 9% in terms of precision. It also outperforms a previous approach focusing on link farm detection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
 |
6
|
Junghoo Cho , Hector Garcia-Molina , Taher Haveliwala , Wang Lam , Andreas Paepcke , Sriram Raghavan , Gary Wesley, Stanford WebBase components and applications, ACM Transactions on Internet Technology (TOIT), v.6 n.2, p.153-186, May 2006
[doi> 10.1145/1149121.1149124]
|
 |
7
|
André Luiz da Costa Carvalho , Paul - Alexandru Chirita , Edleno Silva de Moura , Pável Calado , Wolfgang Nejdl, Site level noise removal for search engines, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135793]
|
| |
8
|
B. D. Davison. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000. Presented at the AAAI-2000 workshop on Artificial Intelligence for Web Search, Technical Report WS-00-01.
|
| |
9
|
I. Drost and T. Scheffer. Thwarting the nigritude ultramarine: learning to identify link spam. In Proceeding of the ECML, 2005.
|
 |
10
|
Dennis Fetterly , Mark Manasse , Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages, Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, June 17-18, 2004, Paris, France
[doi> 10.1145/1017074.1017077]
|
| |
11
|
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, 2004.
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow, 1996.
|
 |
19
|
|
| |
20
|
Open Directory Project (ODP), 2007. http://www.dmoz.com/.
|
| |
21
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Unpublished draft, 1998.
|
| |
22
|
S. E. Robertson. Overview of the OKAPI projects. Journal of Documentation, 53:3--7, 1997.
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
|