ACM Home Page
Please provide us with feedback. Feedback
A reference collection for web spam
Full text PdfPdf (610 KB)
Source ACM SIGIR Forum archive
Volume 40 ,  Issue 2  (December 2006) table of contents
Pages: 11 - 24  
Year of Publication: 2006
ISSN:0163-5840
Authors
Carlos Castillo  Università di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain
Debora Donato  Università di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain
Luca Becchetti  Università di Roma, Rome, Italy
Paolo Boldi  Università degli Studi, Milan, Italy
Stefano Leonardi  Università di Roma, Rome, Italy
Massimo Santini  Università degli Studi, Milan, Italy
Sebastiano Vigna  Università degli Studi, Milan, Italy
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 134,   Citation Count: 28
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1189702.1189703
What is a DOI?

ABSTRACT

We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
{Becchetti et al., 2006} Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and Baeza-Yates, R. (2006). Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD), Pennsylvania, USA. ACM Press.
2
 
3
{Benczúr et al., 2006b} Benczúr, A. A., Csalogány, K., and Sarlós, T. (2006b). Link-based similarity search to fight web spam. In Adversarial Information Retrieval on the Web (AIRWEB), Seattle, Washington, USA.
 
4
{Benczúr et al., 2005} Benczúr, A. A., Csalogány, K., Sarlós, T., and Uher, M. (2005). Spamrank: fully automatic link spam detection. In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, Chiba, Japan.
 
5
6
 
7
{Cohen, 1960} Cohen, J. (1960). A coefficient of agreement for nominal scales. Psychological Bulletin, 20:37--46.
 
8
{Davison, 2000} Davison, B. D. (2000). Recognizing nepotistic links on the web. In Aaai-2000 Workshop On Artificial Intelligence For Web Search, pages 23--28, Austin, Texas. Aaai Press.
9
 
10
{Fleiss, 1971} Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378--382.
11
 
12
{Green, 1997} Green, A. M. (1997). Kappa statistics for multiple raters using categorical classifications. In Proceedings of the Twenty-Second Annual Conference of SAS Users Group, San Diego, USA.
 
13
{Gyöngyi and Garcia-Molina, 2005} Gyöngyi, Z. and Garcia-Molina, H. (2005). Web spam taxonomy. In First International Workshop on Adversarial Information Retrieval on the Web.
 
14
{Gyöngyi et al., 2004} Gyöngyi, Z., Molina, H. G., and Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), pages 576--587, Toronto, Canada. Morgan Kaufmann.
15
 
16
{Page et al., 1998} Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project.
 
17
{Perkins, 2001} Perkins, A. (2001). The classification of search engine spam. Available online at http://www.silverdisc.co.uk/articles/spam-classification/.

CITED BY  28
Collaborative Colleagues:
Carlos Castillo: colleagues
Debora Donato: colleagues
Luca Becchetti: colleagues
Paolo Boldi: colleagues
Stefano Leonardi: colleagues
Massimo Santini: colleagues
Sebastiano Vigna: colleagues