ACM Home Page
Please provide us with feedback. Feedback
Analysing features of Japanese splogs and characteristics of keywords
Full text PdfPdf (912 KB)
Source AIRWeb; Vol. 295 archive
Proceedings of the 4th international workshop on Adversarial information retrieval on the web table of contents
Beijing, China
SESSION: General table of contents
Pages 33-40  
Year of Publication: 2008
ISBN:978-1-60558-159-0
Authors
Yuuki Sato  University of Tsukuba, Tsukuba, Japan
Takehito Utsuro  University of Tsukuba, Tsukuba, Japan
Yoshiaki Murakami  Navix Co., Ltd., Tokyo, Japan
Tomohiro Fukuhara  University of Tokyo, Kashiwa, Japan
Hiroshi Nakagawa  University of Tokyo, Tokyo, Japan
Yasuhide Kawada  Navix Co., Ltd., Tokyo, Japan
Noriko Kando  National Institute of Informatics, Tokyo, Japan
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 44,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1451983.1451993
What is a DOI?

ABSTRACT

This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually examine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various informative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Wikipedia, Spam blog. http://en.wikipedia.org/wiki/Spam_blog.
 
2
Wikipedia, Word salad (computer science). http://en.wikipedia.org/wiki/Word_salad_%28computer_science%29.
 
3
T. Fukuhara, T. Murayama, and T. Nishida. Analyzing concerns of people using Weblog articles and real world temporal data. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
 
4
T. Fukuhara, H. Nakagawa, and T. Nishida. Understanding sentiment of people from news articles: Temporal sentiment analysis of social events. In Proceedings of ICWSM, pages 271--272, 2007.
 
5
T. Fukuhara, T. Utsuro, and H. Nakagawa. Cross-lingual concern analysis from multilingual weblog articles. In A. Nijholt, O. Stock, and T. Nishida, editors, Proceedings of the 6th International Workshop on Social Intelligence Design, pages 55--64, 2007.
 
6
N. Glance, M. Hurst, and T. Tomokiyo. Blogpulse: Automated trend discovery for Weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.
 
7
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. 1st AIRWeb, pages 39--47, 2005.
 
8
P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog identification and Splog detection. In Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, pages 92--99, 2006.
 
9
P. Kolari, T. Finin, and A. Joshi. Spam in blogs and social media. In Tutorial at ICWSM, 2007.
 
10
P. Kolari, A. Joshi, and T. Finin. Characterizing the splogosphere. In Proceedings of WWW 2006 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.
11
 
12
C. Macdonald and I. Ounis. The TREC Blogs06 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, University of Glasgow, Department of Computing Science, 2006.
13
 
14
Y. Sato, T. Utsuro, T. Fukuhara, Y. Kawada, Y. Murakami, H. Nakagawa, and N. Kando. Collecting and analyzing Japanese splogs based on characteristics of keywords. In Proc. ICWSM, pages 218--219, 2008.
 
15
T. Urvoy, T. Lavergne, and P. Filoche. Tracking Web spam with hidden style similarity. In Proc. 2nd AIRWeb, pages 25--30, 2006.
16


Collaborative Colleagues:
Yuuki Sato: colleagues
Takehito Utsuro: colleagues
Yoshiaki Murakami: colleagues
Tomohiro Fukuhara: colleagues
Hiroshi Nakagawa: colleagues
Yasuhide Kawada: colleagues
Noriko Kando: colleagues