ACM Home Page
Please provide us with feedback. Feedback
Web spam detection via commercial intent analysis
Full text PdfPdf (284 KB)
Source AIRWeb; Vol. 215 archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web table of contents
Banff, Alberta, Canada
SESSION: Tagging, P2P, cloaking, and commercial intent table of contents
Pages: 89 - 92  
Year of Publication: 2007
ISBN:978-1-59593-732-2
Authors
András Benczúr  Computer and Automation Research Institute of the Hungarian Academy of Sciences
István Bíró  Computer and Automation Research Institute of the Hungarian Academy of Sciences
Károly Csalogány  Computer and Automation Research Institute of the Hungarian Academy of Sciences
Tamás Sarlós  Computer and Automation Research Institute of the Hungarian Academy of Sciences
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 60,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1244408.1244424
What is a DOI?

ABSTRACT

We propose a number of features for Web spam filtering based on the occurrence of keywords that are either of high advertisement value or highly spammed. Our features include popular words from search engine query logs as well as high cost or volume words according to Google AdWords. We also demonstrate the spam filtering power of the Online Commercial Intention (OCI) value assigned to an URL in a Microsoft adCenter Labs Demonstration and the Yahoo! Mindset classification of Web pages as either commercial or non-commercial as well as metrics based on the occurrence of Google ads on the page. We run our tests on the WEBSPAM-UK2006 dataset recently compiled by Castillo et al. as a standard means of measuring the performance of Web spam detection algorithms. Our features improve the classification accuracy of the publicly available WEBSPAM-UK2006 features by 3%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. A. Benczúr, K. Csalogány, E. Friedman, D. Fogars, T. Sarlós, M. Uher, and E. Windhager. Searching a small national domain---preliminary report. In Proc. WWW, 2003.
 
2
A. A. Benczúr, K. Csalogány, and T. Sarlós. Link-based similarity search to fight web spam. In Proc. AIRWeb, 2006.
 
3
A. A. Benczúr, K. Csalogány, and T. Sarlós, and M. Uher. SpamRank -- Fully automatic link spam detection. In Proc. AIRWeb, 2005.
4
5
 
6
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. DELIS Technical report TR-0458, 2006.
 
7
K. Chellapilla and D. M. Chickering. Improving cloaking detection using search query popularity and monetaizability. In Proc. AIRWeb, pages 17--24, 2006.
8
 
9
I. Drost and T. Scheffer. Thwarting the nigritude ultramarine: Learning to identify link spam. In Proc. ECML, volume 3720 of LNAI, pages 233--243, 2005.
10
11
12
 
13
 
14
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. AIRWeb, 2005.
 
15
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with Trust Rank. In Proc. VLDB, pages 576--587, 2004.
16
17
18
 
19
 
20
B. Wu, V. Goel, and B. D. Davison. Propagating trust and distrust to demote web spam. In Workshop on Models of Trust for the Web, 2006.


Collaborative Colleagues:
András Benczúr: colleagues
István Bíró: colleagues
Károly Csalogány: colleagues
Tamás Sarlós: colleagues