| A reference collection for web spam |
| Full text |
Pdf
(610 KB)
|
| Source
|
ACM SIGIR Forum
archive
Volume 40 , Issue 2 (December 2006)
table of contents
Pages: 11 - 24
Year of Publication: 2006
ISSN:0163-5840
|
|
Authors
|
|
Carlos Castillo
|
Università di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain
|
|
Debora Donato
|
Università di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain
|
|
Luca Becchetti
|
Università di Roma, Rome, Italy
|
|
Paolo Boldi
|
Università degli Studi, Milan, Italy
|
|
Stefano Leonardi
|
Università di Roma, Rome, Italy
|
|
Massimo Santini
|
Università degli Studi, Milan, Italy
|
|
Sebastiano Vigna
|
Università degli Studi, Milan, Italy
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 134, Citation Count: 28
|
|
|
ABSTRACT
We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
{Becchetti et al., 2006} Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and Baeza-Yates, R. (2006). Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD), Pennsylvania, USA. ACM Press.
|
 |
2
|
|
| |
3
|
{Benczúr et al., 2006b} Benczúr, A. A., Csalogány, K., and Sarlós, T. (2006b). Link-based similarity search to fight web spam. In Adversarial Information Retrieval on the Web (AIRWEB), Seattle, Washington, USA.
|
| |
4
|
{Benczúr et al., 2005} Benczúr, A. A., Csalogány, K., Sarlós, T., and Uher, M. (2005). Spamrank: fully automatic link spam detection. In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, Chiba, Japan.
|
| |
5
|
|
 |
6
|
|
| |
7
|
{Cohen, 1960} Cohen, J. (1960). A coefficient of agreement for nominal scales. Psychological Bulletin, 20:37--46.
|
| |
8
|
{Davison, 2000} Davison, B. D. (2000). Recognizing nepotistic links on the web. In Aaai-2000 Workshop On Artificial Intelligence For Web Search, pages 23--28, Austin, Texas. Aaai Press.
|
 |
9
|
|
| |
10
|
{Fleiss, 1971} Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378--382.
|
 |
11
|
|
| |
12
|
{Green, 1997} Green, A. M. (1997). Kappa statistics for multiple raters using categorical classifications. In Proceedings of the Twenty-Second Annual Conference of SAS Users Group, San Diego, USA.
|
| |
13
|
{Gyöngyi and Garcia-Molina, 2005} Gyöngyi, Z. and Garcia-Molina, H. (2005). Web spam taxonomy. In First International Workshop on Adversarial Information Retrieval on the Web.
|
| |
14
|
{Gyöngyi et al., 2004} Gyöngyi, Z., Molina, H. G., and Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), pages 576--587, Toronto, Canada. Morgan Kaufmann.
|
 |
15
|
|
| |
16
|
{Page et al., 1998} Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project.
|
| |
17
|
{Perkins, 2001} Perkins, A. (2001). The classification of search engine spam. Available online at http://www.silverdisc.co.uk/articles/spam-classification/.
|
CITED BY 28
|
|
|
|
|
|
|
|
Krysta M. Svore , Qiang Wu , Chris J. C. Burges , Aaswath Raman, Improving web spam classification using rank-time features, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
András Benczúr , István Bíró , Károly Csalogány , Tamás Sarlós, Web spam detection via commercial intent analysis, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio P. Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, Design trade-offs for search engine caching, ACM Transactions on the Web (TWEB), v.2 n.4, p.1-28, October 2008
|
|
|
|
|
|
|
|
|
Carlos Castillo , Debora Donato , Aristides Gionis , Vanessa Murdock , Fabrizio Silvestri, Know your neighbors: web spam detection using the web topology, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlos Castillo , Claudio Corsi , Debora Donato , Paolo Ferragina , Aristides Gionis, Query-log mining for detecting spam, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
Luca Becchetti , Paolo Boldi , Carlos Castillo , Aristides Gionis, Efficient semi-streaming algorithms for local triangle counting in massive graphs, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
Paolo Boldi , Francesco Bonchi , Carlos Castillo , Debora Donato , Sebastiano Vigna, Query suggestions using query-flow graphs, Proceedings of the 2009 workshop on Web Search Click Data, p.56-63, February 09-09, 2009, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|