ACM Home Page
Please provide us with feedback. Feedback
Improving web spam classifiers using link structure
Full text PdfPdf (201 KB)
Source AIRWeb; Vol. 215 archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web table of contents
Banff, Alberta, Canada
SESSION: Temporal and topological factors table of contents
Pages: 17 - 20  
Year of Publication: 2007
ISBN:978-1-59593-732-2
Authors
Qingqing Gan  Polytechnic University, Brooklyn, NY
Torsten Suel  Polytechnic University, Brooklyn, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 103,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244408.1244412
What is a DOI?

ABSTRACT

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of Web Spam. In Workshop on Advers. Inf. Retrieval on the Web, Aug. 2006.
 
3
A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. Spamrank - fully automatic link spam detection. In Workshop on Advers. Inf. Retrieval on the Web, 2005.
 
4
A. Benczúr, K. C. T., and Sarlós. Link-based similarity search to fight web spam. In Workshop on Advers. Inf. Retrieval on the Web, 2006.
 
5
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. Technical report, Yahoo! Research Barcelona, Nov. 2006.
6
 
7
B. Davison. Recognizing nepotistic links on the web. In Workshop on Artificial Intelligence for Web Search, 2000.
8
 
9
I. Dorst and T. Scheffer. Thwarting the nigritude ultramarine: Learning to identify link spam. In Proc. European Conf. on Machine Learning, 2005.
 
10
11
 
12
Z. Gyongyi and H. Garcia-Molina. Web spam taxonomy. In Workshop on Advers. Inf. Retrieval on the Web, 2005.
 
13
Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proc. 30th VLDB, 2004.
14
15
16
 
17
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.
 
18
 
19
M. Sobek. PRO - Google's PageRank 0 penalty, 2002.
 
20
21
22
 
23
B. Wu, V. Goel, and B. Davison. Propagating trust and distrust to demote Web spam. In Workshop on Models of Trust and the Web, 2006.
 
24
H. Zhang, A. Goel, R. Govindan, K. Mason, and B. V. Roy. Making eigenvector-based reputation systems robust to collusion. In Proc. 3rd Workshop on Web Graphs, 2004.

CITED BY  7

Collaborative Colleagues:
Qingqing Gan: colleagues
Torsten Suel: colleagues