|
ABSTRACT
The annotation of web sites in social bookmarking systems has become a popular way to manage and find information on the web. The community structure of such systems attracts spammers: recent post pages, popular pages or specific tag pages can be manipulated easily. As a result, searching or tracking recent posts does not deliver quality results annotated in the community, but rather unsolicited, often commercial, web sites. To retain the benefits of sharing one's web content, spam-fighting mechanisms that can face the flexible strategies of spammers need to be developed. A classical approach in machine learning is to determine relevant features that describe the system's users, train different classifiers with the selected features and choose the one with the most promising evaluation results. In this paper we will transfer this approach to a social bookmarking setting to identify spammers. We will present features considering the topological, semantic and profile-based information which people make public when using the system. The dataset used is a snapshot of the social bookmarking system BibSonomy and was built over the course of several months when cleaning the system from spam. Based on our features, we will learn a large set of different classification models and compare their performance. Our results represent the groundwork for a first application in BibSonomy and for the building of more elaborate spam detection mechanisms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Carlos Castillo , Debora Donato , Aristides Gionis , Vanessa Murdock , Fabrizio Silvestri, Know your neighbors: web spam detection using the web topology, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277814]
|
| |
2
|
Ciro Cattuto_aff3n2 , Christoph Schmitz , Andrea Baldassarri , Vito D. P. Servedio_aff2n3 , Vittorio Loreto_aff2n3 , Andreas Hotho , Miranda Grahl , Gerd Stumme, Network properties of folksonomies, AI Communications, v.20 n.4, p.245-262, December 2007
|
| |
3
|
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines (version 2.31).
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
T. Hammond, T. Hannay, B. Lund, and J. Scott. Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11(4), April 2005.
|
| |
8
|
|
| |
9
|
|
| |
10
|
A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. BibSonomy: A social bookmark and publication sharing system. In CS-TIW '06, Aalborg, Denmark, July 2006. Aalborg University Press.
|
| |
11
|
A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Proc. ESWC '06, pages 411--426, Budva, Montenegro, June 2006. Springer.
|
| |
12
|
R. Jäschke, L. B. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In Proc. PKDD '07, Berlin, Heidelberg.
|
| |
13
|
P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog Identification and Splog Detection. AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
|
| |
14
|
P. Kolari, A. Java, T. Finin, T. Oates, and A. Joshi. Detecting Spam Blogs: A Machine Learning Approach. AAAI '06, 2006.
|
 |
15
|
Georgia Koutrika , Frans Adjie Effendi , Zoltán Gyöngyi , Paul Heymann , Hector Garcia-Molina, Combating spam in tagging systems, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
[doi> 10.1145/1244408.1244420]
|
| |
16
|
R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. Lecture Notes in Computer Science, 3993:1114, Dec 2005.
|
| |
17
|
B. Lund, T. Hammond, M. Flack, and T. Hannay. Social Bookmarking Tools (II): A Case Study - Connotea. D-Lib Magazine, 11(4), April 2005.
|
| |
18
|
A. Mathes. Folksonomies - Cooperative Classification and Communication Through Shared Metadata, December 2004. http://www.adammathes.com/academic/computermediated- communication/folksonomies.html.
|
| |
19
|
P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. ISWC '05, LNCS, pages 522--536. Springer, 2005.
|
| |
20
|
G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proc. AIRWeb '05, pages 1--6, New York, NY, USA, 2005. ACM.
|
 |
21
|
|
| |
22
|
|
 |
23
|
Shilad Sen , Shyong K. Lam , Al Mamunur Rashid , Dan Cosley , Dan Frankowski , Jeremy Osterhouse , F. Maxwell Harper , John Riedl, tagging, communities, vocabulary, evolution, Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, November 04-08, 2006, Banff, Alberta, Canada
[doi> 10.1145/1180875.1180904]
|
| |
24
|
|
 |
25
|
|
|