ACM Home Page
Please provide us with feedback. Feedback
Looking into the past to better classify web spam
Full text PdfPdf (634 KB)
Source ACM International Conference Proceeding Series archive
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web table of contents
Madrid, Spain
SESSION: Temporal analysis table of contents
Pages 1-8  
Year of Publication: 2009
ISBN:978-1-60558-438-6
Authors
Na Dai  Lehigh University, Bethlehem, PA
Brian D. Davison  Lehigh University, Bethlehem, PA
Xiaoguang Qi  Lehigh University, Bethlehem, PA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 99,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1531914.1531916
What is a DOI?

ABSTRACT

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue that historical web page information may also be important in spam classification. In this paper, we use content features from historical versions of web pages to improve spam classification. We use supervised learning techniques to combine classifiers based on current page content with classifiers based on temporal features. Experiments on the WEBSPAM-UK2007 dataset show that our approach improves spam classification F-measure performance by 30% compared to a baseline classifier which only considers current page content.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Acharya, M. Cutts, J. Dean, P. Haahr, M. Henzinger, U. Hoelzle, S. Lawrence, K. Pfleger, O. Sercinoglu, and S. Tong. Information retrieval based on historical data. United States Patent 20050071741, USPTO, Mar. 2005.
2
3
 
4
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of web spam. In Proceedings of the Second International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 1--8, Aug. 2006.
 
5
A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. SpamRank -- Fully automatic link spam detection. In Proceedings of the First International Workshop on Adversarial Information Retrieval (AIRWeb), May 2005.
6
7
 
8
B. D. Davison. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000. Presented at the AAAI-2000 workshop on Artificial Intelligence for Web Search, Technical Report WS-00-01.
 
9
Google Inc. Google home page. http://www.google.com/, 2009.
 
10
 
11
 
12
Internet Archive. The Internet Archive. http://www.archive.org/, 2009.
 
13
14
15
 
16
The dmoz Open Directory Project (ODP), 2009. http://www.dmoz.org/.
 
17
 
18
19
 
20
21
22
23


Collaborative Colleagues:
Na Dai: colleagues
Brian D. Davison: colleagues
Xiaoguang Qi: colleagues