|
ABSTRACT
Data created by social bookmarking systems can be described as 3-partite 3-uniform hypergraphs connecting documents, users, and tags (tagging networks), such that the toolbox of complex network analysis can be applied to examine their properties. One of the most basic tools, the analysis of connected components, however cannot be applied meaningfully: Tagging networks tend to be almost entirely connected. We therefore propose a generalization of connected components, m-hyperincident connected components. We show that decomposing tagging networks into 2-hyperincident connected components yields a characteristic component distribution with a salient giant component that can be found across various datasets. This pattern changes if the underlying formation process changes, for example, if the hypergraph is constructed from search logs, or if the tagging data is contaminated by spam: It turns out that the second- to 129th largest components of the spam-labeled Bibsonomy dataset are inhabited exclusively by spam users. Based on these findings, we propose and unsupervised method for spam detection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
F. Boesch. Synthesis of reliable networks, a survey. IEEE Trans. Reliab, 35:240--246, 1986.
|
| |
2
|
Ciro Cattuto_aff3n2 , Christoph Schmitz , Andrea Baldassarri , Vito D. P. Servedio_aff2n3 , Vittorio Loreto_aff2n3 , Andreas Hotho , Miranda Grahl , Gerd Stumme, Network properties of folksonomies, AI Communications, v.20 n.4, p.245-262, December 2007
|
| |
3
|
P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17--61, 1960.
|
| |
4
|
|
| |
5
|
A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.
|
| |
6
|
|
| |
7
|
C. Goldschmidt. Critical random hypergraphs: the emergence of a giant set of identifiable vertices,. Annals of Probability, 33(4):1573--1600, 2005.
|
| |
8
|
|
| |
9
|
A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. ECML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.
|
 |
10
|
Georgia Koutrika , Frans Adjie Effendi , Zoltán Gyöngyi , Paul Heymann , Hector Garcia-Molina, Combating spam in tagging systems, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
[doi> 10.1145/1244408.1244420]
|
 |
11
|
|
 |
12
|
|
| |
13
|
N. Neubauer and K. Obermayer. Predicting tag spam examining cooccurrences, network structures and url components. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.
|
| |
14
|
Knowledge&Data Engineering Group, University of Kassel Benchmark folksonomy data from bibsonomy, version of june 30th, 2008.
|
 |
15
|
|
| |
16
|
J. Schmidt-Pruzan and E. Shamir. Component structure in the evolution of random hypergraphs. Combinatorica, 5(1):81--94, 1984.
|
| |
17
|
|
| |
18
|
R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del.icio.us cookbook. In Mining Social Data (MSoDa) Workshop Proceedings, ECAI 2008, pages 26--30, 2008.
|
|