| Splog detection using self-similarity analysis on blog temporal dynamics |
| Full text |
Pdf
(509 KB)
|
| Source
|
AIRWeb; Vol. 215
archive
Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
table of contents
Banff, Alberta, Canada
SESSION: Temporal and topological factors
table of contents
Pages: 1 - 8
Year of Publication: 2007
ISBN:978-1-59593-732-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 109, Citation Count: 8
|
|
|
ABSTRACT
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. There are three key ideas in our splog detection framework. We first represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts. Second, we show via a novel visualization that the blog temporal characteristics reveal attribute correlation, depending on type of the blog (normal blogs and splogs). Third, we propose the use of temporal structural properties computed from self-similarity matrices across different attributes. In a splog detector, these novel features are combined with content based features. We extract a content based feature vector from different parts of the blog -- URLs, post content, etc. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM based splog detector using proposed features on real world datasets, with excellent results (90% accuracy).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Wikipedia, Spam blog http://en.wikipedia.org/wiki/Splog.
|
| |
2
|
C.-C. Chang and C.-J. Lin (2001). LIBSVM: a library for support vector machines.
|
| |
3
|
J. Eckmann, S. O. Kamphorst and D. Ruelle (1987). Recurrence plots of dynamical systems. Europhysics Letters(4): 973--977.
|
| |
4
|
J. Foote, M. Cooper and U. Nam (2002). Audio retrieval by rhythmic similarity, Proceedings of the International Conference on Music Information Retrieval, 265--266.
|
| |
5
|
Z. Gyöngyi, H. Garcia-Molina and J. Pedersen (2004). Combating web spam with TrustRank, Proceedings of the 30th International Conference on Very Large Data Bases (VLDB) 2004, Toronto, Canada,
|
| |
6
|
|
| |
7
|
P. Kolari (2005) Welcome to the Splogosphere: 75% of new pings are spings (splogs) permalink: http://ebiquity.umbc.edu/blogger/2005/12/15/welcome-to-the-splogosphere-75-of-new-blog-posts-are-spam/.
|
| |
8
|
P. Kolari, T. Finin and A. Joshi (2006). SVMs for the blogosphere: Blog identification and splog detection, AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs,
|
| |
9
|
P. Kolari, A. Java and T. Finin (2006). Characterizing the Splogosphere, Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference,
|
| |
10
|
P. Kolari, A. Java, T. Finin, T. Oates and A. Joshi (2006). Detecting Spam Blogs: A Machine Learning Approach, Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), Boston, MA, July 2006.
|
 |
11
|
|
| |
12
|
Guoyang Shen , Bin Gao , Tie-Yan Liu , Guang Feng , Shiji Song , Hang Li, Detecting Link Spam Using Temporal Information, Proceedings of the Sixth International Conference on Data Mining, p.1049-1053, December 18-22, 2006
[doi> 10.1109/ICDM.2006.51]
|
| |
13
|
Umbria (2006) Spam in the blogosphere http://www.umbrialistens.com/files/uploads/umbria_splog.pdf.
|
CITED BY 8
|
|
|
|
|
Nitin Agarwal , Huan Liu , Lei Tang , Philip S. Yu, Identifying the influential bloggers in a community, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Yuuki Sato , Takehito Utsuro , Yoshiaki Murakami , Tomohiro Fukuhara , Hiroshi Nakagawa , Yasuhide Kawada , Noriko Kando, Analysing features of Japanese splogs and characteristics of keywords, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
Taichi Katayama , Takehito Utsuro , Yuuki Sato , Takayuki Yoshinaka , Yasuhide Kawada , Tomohiro Fukuhara, An empirical study on selective sampling in active learning for splog detection, Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, April 21-21, 2009, Madrid, Spain
|
|
|
|
|