ACM Home Page
Please provide us with feedback. Feedback
Tag spam creates large non-giant connected components
Full text PdfPdf (687 KB)
Source ACM International Conference Proceeding Series archive
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web table of contents
Madrid, Spain
SESSION: Social spam table of contents
Pages 49-52  
Year of Publication: 2009
ISBN:978-1-60558-438-6
Authors
Nicolas Neubauer  Technische Universität Berlin
Robert Wetzker  Technische Universität Berlin
Klaus Obermayer  Technische Universität Berlin
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 44,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1531914.1531925
What is a DOI?

ABSTRACT

Spammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domain-specific features (like URL terms) or similarity of users to previously identified spammers. However, as shown in our previous work, it is possible to identify a large fraction of spam users based on purely structural features. The hypergraph connecting documents, users, and tags can be decomposed into connected components, and any large, but non-giant components turned out to be almost entirely inhabitated by spam users in the examined dataset. Here, we test to what degree the decomposition of the complete hypergraph is really necessary, examining the component structure of the induced user/document and user/tag graphs. While the user/tag graph's connectivity does not help in classifying spammers, the user/document graph's connectivity is already highly informative. It can however be augmented with connectivity information from the hypergraph. In our view, spam detection based on structural features, like the one proposed here, requires complex adaptation strategies from spammers and may complement other, more traditional detection approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17--61, 1960.
 
2
A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.
 
3
 
4
A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. ECML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.
5
6
 
7
N. Neubauer and K. Obermayer. Predicting tag spam examining cooccurrences, network structures and url components. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.
 
8
N. Neubauer and K. Obermayer. Hyperincident components of tagging networks (submitted). In HyperText 2009, Proceedings of, 2009.
 
9
Knowledge Discovery and Data Engineering Group, University of Kassel. Benchmark folksonomy data from bibsonomy, version of june 30th, 2008.
 
10
E. Santos-Neto, M. Ripeanu, and A. Iamnitchi. Tracking usage in collaborative tagging communities.
 
11
R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del.icio.us cookbook. In Mining Social Data (MSoDa) Workshop Proceedings, ECAI 2008, pages 26--30, 2008.

Collaborative Colleagues:
Nicolas Neubauer: colleagues
Robert Wetzker: colleagues
Klaus Obermayer: colleagues