ACM Home Page
Please provide us with feedback. Feedback
Identifying web spam with user behavior analysis
Full text PdfPdf (1.19 MB)
Source AIRWeb; Vol. 295 archive
Proceedings of the 4th international workshop on Adversarial information retrieval on the web table of contents
Beijing, China
SESSION: Usage analysis table of contents
Pages 9-16  
Year of Publication: 2008
ISBN:978-1-60558-159-0
Authors
Yiqun Liu  Tsinghua University, Beijing, China P.R.
Rongwei Cen  Tsinghua University, Beijing, China P.R.
Min Zhang  Tsinghua University, Beijing, China P.R.
Shaoping Ma  Tsinghua University, Beijing, China P.R.
Liyun Ru  Tsinghua University, Beijing, China P.R.
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 189,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1451983.1451986
What is a DOI?

ABSTRACT

Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spam and are incapable and inefficient for newly-appeared spam. With user behavior analyses into Web access logs, we propose a spam page detection algorithm based on Bayesian Learning. The main contributions of our work are: (1) User visiting patterns of spam pages are studied and three user behavior features are proposed to separate Web spam from ordinary ones. (2) A novel spam detection framework is proposed that can detect unknown spam types and newly-appeared spam with the help of user behavior analysis. Preliminary experiments on large scale Web access log data (containing over 2.74 billion user clicks) show the effectiveness of the proposed features and detection framework.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
CNNIC (China Internet Network Information Center), the 16th report in development of Internet in China. Online at http://www.cnnic.net.cn/uploadfiles/pdf/2005/7/20/210342.pdf.
2
 
3
Gyongyi, Z. and Garcia-Molina, H. Web spam taxonomy. In First International Workshop on Adversarial Information Retrieval on the Web, 2005.
 
4
Henzinger, M. R., Motwani, R., Silverstein, C. 2003. Challenges in Web Search Engines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (2003) 1573--1579.
 
5
6
 
7
Wu, B. and Davison, B. Cloaking and redirection: a preliminary study. In First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb '05), May 2005.
8
9
10
 
11
Davison B. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000. Presented at the AAAI-2000 workshop on Artificial Intelligence for Web Search, Technical Report WS-00-01.
12
 
13
 
14
Krishnan, V. and Raj, R. Web Spam Detection with Anti-Trust-Rank. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), August 2006.
 
15
L. Becchetti, C. Castillol D. Donatol, S. Leonardi, and R. Baeza-Yates. Using Rank Propagation and Probabilistic Counting for Link Based Spam Detection. In Proc. of WebKDD'06, August 2006.
 
16
17
 
18
Sullivan D. 2006. Searches Per Day. Retrieved from search engine watch web site http://searchenginewatch.com/reports/article.php/2156461.
19
 
20
Yu, H., Liu, Y., Zhang, M. and Ma, S. Research in Search Engine User Behavior Based on Log Analysis. Journal of Chinese Information Processing. Vol. 21(1): pp. 109--114, 2007.
 
21
 
22
 
23
 
24
 
25
 
26
Mitchell, T. Chapter 6: Bayesian Learning, in Mitchell, T., Machine Learning, McGraw-Hill Education, 1997.
 
27
Web Spam Challenge Website: http://webspam.lip6.fr/

Collaborative Colleagues:
Yiqun Liu: colleagues
Rongwei Cen: colleagues
Min Zhang: colleagues
Shaoping Ma: colleagues
Liyun Ru: colleagues