ACM Home Page
Please provide us with feedback. Feedback
Web spam challenge proposal for filtering in archives
Full text PdfPdf (616 KB)
Source ACM International Conference Proceeding Series archive
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web table of contents
Madrid, Spain
SESSION: Spam research collections table of contents
Pages 61-62  
Year of Publication: 2009
ISBN:978-1-60558-438-6
Authors
András A. Benczúr  Computer and Automation Research Institute of the Hungarian Academy of Sciences
Miklós Erdélyi  University of Pannonia and Computer and Automation Research Institute of the Hungarian Academy of Sciences
Julien Masanés  European Archive Foundation, France
Dávid Siklósi  Computer and Automation Research Institute of the Hungarian Academy of Sciences
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 43,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1531914.1531928
What is a DOI?

ABSTRACT

In this paper we propose new tasks for a possible future Web Spam Challenge motivated by the needs of the archival community. The Web archival community consists of several relatively small institutions that operate independently and possibly over different top level domains (TLDs). Each of them may have a large set of historic crawls. Efficient filtering would hence require (1) enhanced use of the time series of domain snapshots and (2) collaboration by transferring models across different TLDs. Corresponding Challenge tasks could hence include the distribution of crawl snapshot data for feature generation as well as classification of unlabeled new crawls of the same or even different TLDs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Abernethy, O. Chapelle, and C. Castillo. WITCH: A New Approach to Web Spam Detection. In Proc. 4th Int. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008.
2
 
3
4
5
 
6
G. Cormack. Content-based Web Spam Detection. In Proc. 3rd Int. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2007.
7
8
 
9
G. Geng, X. Jin, and C. Wang. CASIA at WSC2008. In Proc. 4th Int. Workshop on Adversarial Information Retrieval on the Web (AIRWeb), 2008.
 
10
Y. joo Chung, M. Toyoda, and M. Kitsuregawa. A study of web spam evolution using a time series of web snapshots. In AIRWeb '09: Proc. 5th int. workshop on Adversarial information retrieval on the web, 2009.
11


Collaborative Colleagues:
András A. Benczúr: colleagues
Miklós Erdélyi: colleagues
Julien Masanés: colleagues
Dávid Siklósi: colleagues