ACM Home Page
Please provide us with feedback. Feedback
A large-scale study of link spam detection by graph algorithms
Full text PdfPdf (126 KB)
Source AIRWeb; Vol. 215 archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web table of contents
Banff, Alberta, Canada
SESSION: Link farms table of contents
Pages: 45 - 48  
Year of Publication: 2007
ISBN:978-1-59593-732-2
Authors
Hiroo Saito  Aihara Complexity Modelling Project, ERATO, JST, Tokyo, Japan and University of Tokyo, Tokyo, Japan
Masashi Toyoda  University of Tokyo, Tokyo, Japan
Masaru Kitsuregawa  University of Tokyo, Tokyo, Japan
Kazuyuki Aihara  Aihara Complexity Modelling Project, ERATO, JST, Tokyo, Japan and University of Tokyo, Tokyo, Japan
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 59,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244408.1244417
What is a DOI?

ABSTRACT

Link spam refers to attempts to promote the ranking of spammers' web sites by deceiving link-based ranking algorithms in search engines. Spammers often create densely connected link structure of sites so called "link farm". In this paper, we study the overall structure and distribution of link farms in a large-scale graph of the Japanese Web with 5.8 million sites and 283 million links. To examine the spam structure, we apply three graph algorithms to the web graph. First, the web graph is decomposed into strongly connected components (SCC). Beside the largest SCC (core) in the center of the web, we have observed that most of large components consist of link farms. Next, to extract spam sites in the core, we enumerate maximal cliques as seeds of link farms. Finally, we expand these link farms as a reliable spam seed set by a minimum cut technique that separates links among spam and non-spam sites. We found about 0.6 million spam sites in SCCs around the core, and extracted additional 8 thousand and 49 thousand sites as spams with high precision in the core by the maximal clique enumeration and by the minimum cut technique, respectively.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
L. Becchetti, C. Castillo, and D. Donato. Link-based characterization and detection of web spam. In Proc. of AIRWEB 2006, Seattle, 2006.
 
2
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Using rank propagation and probabilistic counting for link-based spam detetection. In Proc. of KDD 2006, Philadelphia, Pennsylvania, 2006.
 
3
A. Benczúr, K. Csalogány, and T. Sarlós. Link-based similarity search to fight web spam. In Proc. of AIRWEB 2006, Seattle, 2006.
 
4
A. Benczúr, K. Csalogány, T. Sarlós, and M. Uher. Spamrank -- fully automatic link spam detection. In Proc. of AIRWEB 2005, Chiba, 2005.
 
5
6
 
7
 
8
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. of AIRWEB 2005, Chiba, 2005.
 
9
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. of VLDB 2004, Toronto, 2004.
10
 
11
V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In Proc. of AIRWEB 2006, Seattle, 2006.
 
12
K. Makino and T. Uno. New algorithms for enumerating all maximal cliques. In SWAT 2004, Humlebaek, 2004.
 
13
P. T. Metaxas and J. DeStefano. Web spam, propaganda and trust. In Proc. of AIRWEB 2005, Chiba, 2005.
14
 
15
T. Ono, M. Toyoda, and M. Kitsuregawa. An examination of techniques for identifying web spam by link analysis. In Proc. of DEWS 2006, Tokyo, 2006.
 
16
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.
17


Collaborative Colleagues:
Hiroo Saito: colleagues
Masashi Toyoda: colleagues
Masaru Kitsuregawa: colleagues
Kazuyuki Aihara: colleagues