| Identifying web spam with user behavior analysis |
| Full text |
Pdf
(1.19 MB)
|
| Source
|
AIRWeb; Vol. 295
archive
Proceedings of the 4th international workshop on Adversarial information retrieval on the web
table of contents
Beijing, China
SESSION: Usage analysis
table of contents
Pages 9-16
Year of Publication: 2008
ISBN:978-1-60558-159-0
|
|
Authors
|
|
Yiqun Liu
|
Tsinghua University, Beijing, China P.R.
|
|
Rongwei Cen
|
Tsinghua University, Beijing, China P.R.
|
|
Min Zhang
|
Tsinghua University, Beijing, China P.R.
|
|
Shaoping Ma
|
Tsinghua University, Beijing, China P.R.
|
|
Liyun Ru
|
Tsinghua University, Beijing, China P.R.
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 20, Downloads (12 Months): 189, Citation Count: 0
|
|
|
ABSTRACT
Combating Web spam has become one of the top challenges for Web search engines. State-of-the-art spam detection techniques are usually designed for specific known types of Web spam and are incapable and inefficient for newly-appeared spam. With user behavior analyses into Web access logs, we propose a spam page detection algorithm based on Bayesian Learning. The main contributions of our work are: (1) User visiting patterns of spam pages are studied and three user behavior features are proposed to separate Web spam from ordinary ones. (2) A novel spam detection framework is proposed that can detect unknown spam types and newly-appeared spam with the help of user behavior analysis. Preliminary experiments on large scale Web access log data (containing over 2.74 billion user clicks) show the effectiveness of the proposed features and detection framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
CNNIC (China Internet Network Information Center), the 16th report in development of Internet in China. Online at http://www.cnnic.net.cn/uploadfiles/pdf/2005/7/20/210342.pdf.
|
 |
2
|
|
| |
3
|
Gyongyi, Z. and Garcia-Molina, H. Web spam taxonomy. In First International Workshop on Adversarial Information Retrieval on the Web, 2005.
|
| |
4
|
Henzinger, M. R., Motwani, R., Silverstein, C. 2003. Challenges in Web Search Engines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (2003) 1573--1579.
|
| |
5
|
|
 |
6
|
|
| |
7
|
Wu, B. and Davison, B. Cloaking and redirection: a preliminary study. In First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb '05), May 2005.
|
 |
8
|
Yi-Min Wang , Ming Ma , Yuan Niu , Hao Chen, Spam double-funnel: connecting web spammers with advertisers, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242612]
|
 |
9
|
Dennis Fetterly , Mark Manasse , Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages, Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, June 17-18, 2004, Paris, France
[doi> 10.1145/1017074.1017077]
|
 |
10
|
|
| |
11
|
Davison B. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000. Presented at the AAAI-2000 workshop on Artificial Intelligence for Web Search, Technical Report WS-00-01.
|
 |
12
|
Einat Amitay , David Carmel , Adam Darlow , Ronny Lempel , Aya Soffer, The connectivity sonar: detecting site functionality by structural patterns, Proceedings of the fourteenth ACM conference on Hypertext and hypermedia, August 26-30, 2003, Nottingham, UK
[doi> 10.1145/900051.900060]
|
| |
13
|
|
| |
14
|
Krishnan, V. and Raj, R. Web Spam Detection with Anti-Trust-Rank. In the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), August 2006.
|
| |
15
|
L. Becchetti, C. Castillol D. Donatol, S. Leonardi, and R. Baeza-Yates. Using Rank Propagation and Probabilistic Counting for Link Based Spam Detection. In Proc. of WebKDD'06, August 2006.
|
| |
16
|
|
 |
17
|
Krysta M. Svore , Qiang Wu , Chris J. C. Burges , Aaswath Raman, Improving web spam classification using rank-time features, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
[doi> 10.1145/1244408.1244411]
|
| |
18
|
Sullivan D. 2006. Searches Per Day. Retrieved from search engine watch web site http://searchenginewatch.com/reports/article.php/2156461.
|
 |
19
|
|
| |
20
|
Yu, H., Liu, Y., Zhang, M. and Ma, S. Research in Search Engine User Behavior Based on Log Analysis. Journal of Chinese Information Processing. Vol. 21(1): pp. 109--114, 2007.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
Mitchell, T. Chapter 6: Bayesian Learning, in Mitchell, T., Machine Learning, McGraw-Hill Education, 1997.
|
| |
27
|
Web Spam Challenge Website: http://webspam.lip6.fr/
|
|