ACM Home Page
Please provide us with feedback. Feedback
Measuring similarity to detect qualified links
Full text PdfPdf (349 KB)
Source AIRWeb; Vol. 215 archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web table of contents
Banff, Alberta, Canada
SESSION: Link farms table of contents
Pages: 49 - 56  
Year of Publication: 2007
ISBN:978-1-59593-732-2
Authors
Xiaoguang Qi  Lehigh University
Lan Nie  Lehigh University
Brian D. Davison  Lehigh University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 45,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244408.1244418
What is a DOI?

ABSTRACT

The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other than to confer authority. Such links bring noise into link analysis and harm the quality of retrieval. In order to provide high quality search results, it is important to detect them and reduce their influence. In this paper, a method is proposed to detect such links by considering multiple similarity measures over the source pages and target pages. With the help of a classifier, these noisy links are detected and dropped. After that, link analysis algorithms are performed on the reduced link graph. The usefulness of a number of features are also tested. Experiments across 53 query-specific datasets show our approach almost doubles the performance of Kleinberg's HITS and boosts Bharat and Henzinger's imp algorithm by close to 9% in terms of precision. It also outperforms a previous approach focusing on link farm detection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
6
7
 
8
B. D. Davison. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000. Presented at the AAAI-2000 workshop on Artificial Intelligence for Web Search, Technical Report WS-00-01.
 
9
I. Drost and T. Scheffer. Thwarting the nigritude ultramarine: learning to identify link spam. In Proceeding of the ECML, 2005.
10
 
11
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, 2004.
12
 
13
14
15
 
16
17
 
18
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow, 1996.
19
 
20
Open Directory Project (ODP), 2007. http://www.dmoz.com/.
 
21
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Unpublished draft, 1998.
 
22
S. E. Robertson. Overview of the OKAPI projects. Journal of Documentation, 53:3--7, 1997.
 
23
 
24
25


Collaborative Colleagues:
Xiaoguang Qi: colleagues
Lan Nie: colleagues
Brian D. Davison: colleagues