ACM Home Page
Please provide us with feedback. Feedback
Extracting link spam using biased random walks from spam seed sets
Full text PdfPdf (102 KB)
Source AIRWeb; Vol. 215 archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web table of contents
Banff, Alberta, Canada
SESSION: Link farms table of contents
Pages: 37 - 44  
Year of Publication: 2007
ISBN:978-1-59593-732-2
Authors
Baoning Wu  Lehigh University, Bethlehem, PA
Kumar Chellapilla  Microsoft Live Labs, Redmond, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 61,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244408.1244416
What is a DOI?

ABSTRACT

Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link spam. Link farms and link exchanges are two common instances of link spam that produce spam communities -- i.e., clusters in the web graph. In this paper, we present a directed approach to extracting link spam communities when given one or more members of the community. In contrast to previous completely automated approaches to finding link spam, our method is specifically designed to be used interactively. Our approach starts with a small spam seed set provided by the user and simulates a random walk on the web graph. The random walk is biased to explore the local neighborhood around the seed set through the use of decay probabilities. Truncation is used to retain only the most frequently visited nodes. After termination, the nodes are sorted in decreasing order of their final probabilities and presented to the user. Experiments using manually labeled link spam data sets and random walks from a single seed domain show that the approach achieves over 95.12% precision in extracting large link farms and 80.46% precision in extracting link exchange centroids.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Chekuri, M. H. Goldwasser, P. Raghavan, and E. Upfal. "Web search using automatic classification." In Proceedings of the 6th International World Wide Web Conference (WWW), San Jose, US, 1997.
 
2
The Word Spy - Spamdexing. http://www.wordspy.com/words/spamdexing.asp.
 
3
 
4
5
 
6
7
 
8
Z. Gyöngyi and H. Garcia-Molina. "Web spam taxonomy." In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005.
 
9
10
 
11
H. Zhang, A. Goel, R. Govindan, K. Mason, and B. V. Roy. "Making eigenvector-based reputation systems robust to collusion." In Proceedings of the 3rd Workshop on Algorithms and Models for the Web-Graph (WAW), Rome, Italy, October 2004. Full version to appear in Internet Mathematics.
 
12
R. Baeza-Yates, C. Castillo, and V. López. "PageRank increase under different collusion topologies." In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005.
 
13
14
15
16
17
 
18
A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. "SpamRank - Fully automatic link spam detection." In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2005.
 
19
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. "Combating web spam with TrustRank." In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, 2004.
20
 
21
R. Raj, V. Krishnan. "Web Spam Detection with Anti-Trust Rank." Second International Workshop on Adversarial Information Retrieval on the Web (At the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval).
22
 
23
BadRank as the opposite of PageRank. http://en.pr10.info/pagerank0-badrank/.
24
 
25
B. D. Davison. "Recognizing nepotistic links on the web." In AAAI-2000 Workshop on Artificial Intelligence for Web Search, Austin, TX, pages 23--28, July 30 2000.
26
27


Collaborative Colleagues:
Baoning Wu: colleagues
Kumar Chellapilla: colleagues