| Analysing features of Japanese splogs and characteristics of keywords |
| Full text |
Pdf
(912 KB)
|
| Source
|
AIRWeb; Vol. 295
archive
Proceedings of the 4th international workshop on Adversarial information retrieval on the web
table of contents
Beijing, China
SESSION: General
table of contents
Pages: 33-40
Year of Publication: 2008
ISBN:978-1-60558-159-0
|
|
Authors
|
|
Yuuki Sato
|
University of Tsukuba, Tsukuba, Japan
|
|
Takehito Utsuro
|
University of Tsukuba, Tsukuba, Japan
|
|
Yoshiaki Murakami
|
Navix Co., Ltd., Tokyo, Japan
|
|
Tomohiro Fukuhara
|
University of Tokyo, Kashiwa, Japan
|
|
Hiroshi Nakagawa
|
University of Tokyo, Tokyo, Japan
|
|
Yasuhide Kawada
|
Navix Co., Ltd., Tokyo, Japan
|
|
Noriko Kando
|
National Institute of Informatics, Tokyo, Japan
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 60, Citation Count: 1
|
|
|
ABSTRACT
This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually examine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various informative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Wikipedia, Spam blog. http://en.wikipedia.org/wiki/Spam_blog.
|
| |
2
|
Wikipedia, Word salad (computer science). http://en.wikipedia.org/wiki/Word_salad_%28computer_science%29.
|
| |
3
|
T. Fukuhara, T. Murayama, and T. Nishida. Analyzing concerns of people using Weblog articles and real world temporal data. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
|
| |
4
|
T. Fukuhara, H. Nakagawa, and T. Nishida. Understanding sentiment of people from news articles: Temporal sentiment analysis of social events. In Proceedings of ICWSM, pages 271--272, 2007.
|
| |
5
|
T. Fukuhara, T. Utsuro, and H. Nakagawa. Cross-lingual concern analysis from multilingual weblog articles. In A. Nijholt, O. Stock, and T. Nishida, editors, Proceedings of the 6th International Workshop on Social Intelligence Design, pages 55--64, 2007.
|
| |
6
|
N. Glance, M. Hurst, and T. Tomokiyo. Blogpulse: Automated trend discovery for Weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.
|
| |
7
|
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. 1st AIRWeb, pages 39--47, 2005.
|
| |
8
|
P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog identification and Splog detection. In Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, pages 92--99, 2006.
|
| |
9
|
P. Kolari, T. Finin, and A. Joshi. Spam in blogs and social media. In Tutorial at ICWSM, 2007.
|
| |
10
|
P. Kolari, A. Joshi, and T. Finin. Characterizing the splogosphere. In Proceedings of WWW 2006 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.
|
 |
11
|
Yu-Ru Lin , Hari Sundaram , Yun Chi , Junichi Tatemura , Belle L. Tseng, Splog detection using self-similarity analysis on blog temporal dynamics, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
[doi> 10.1145/1244408.1244410]
|
| |
12
|
C. Macdonald and I. Ounis. The TREC Blogs06 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, University of Glasgow, Department of Computing Science, 2006.
|
 |
13
|
Tomoyuki Nanno , Toshiaki Fujiki , Yasuhiro Suzuki , Manabu Okumura, Automatically collecting, monitoring, and mining japanese weblogs, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, May 19-21, 2004, New York, NY, USA
[doi> 10.1145/1013367.1013455]
|
| |
14
|
Y. Sato, T. Utsuro, T. Fukuhara, Y. Kawada, Y. Murakami, H. Nakagawa, and N. Kando. Collecting and analyzing Japanese splogs based on characteristics of keywords. In Proc. ICWSM, pages 218--219, 2008.
|
| |
15
|
T. Urvoy, T. Lavergne, and P. Filoche. Tracking Web spam with hidden style similarity. In Proc. 2nd AIRWeb, pages 25--30, 2006.
|
 |
16
|
Yi-Min Wang , Ming Ma , Yuan Niu , Hao Chen, Spam double-funnel: connecting web spammers with advertisers, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242612]
|
CITED BY
|
|
Taichi Katayama , Takehito Utsuro , Yuuki Sato , Takayuki Yoshinaka , Yasuhide Kawada , Tomohiro Fukuhara, An empirical study on selective sampling in active learning for splog detection, Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, April 21-21, 2009, Madrid, Spain
|
|