ACM Home Page
Please provide us with feedback. Feedback
Blog analysis and mining technologies to summarize the wisdom of crowds
Full text PdfPdf (39 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007) table of contents
San Jose, California
Article No. 3  
Year of Publication: 2007
ISBN:978-1-59593-837-4
Author
Belle L. Tseng  NEC Laboratories America, Cupertino, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 260,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1341920.1341923
What is a DOI?

ABSTRACT

Blogs have become a prominent social media that creates a fast growing social network on the Internet. Blogs enable users to quickly and easily publish content, including highly personal thoughts and professional opinions. Our objective is to understand the blogosphere and summarize the wisdom of crowds. To achieve this goal, my presentation will focus on three graph analysis and mining technologies, (1) clustering, (2) ranking, and (3) visualization.

A blog is typically a web site that consists of dated entries in reverse chronological order written and maintained by a user (blogger). Since a blog entry can have hyperlinks to web pages or other blog entries, the information structure of blogs and links can be seen as a temporal graph. Temporal graphs open a new domain for social media analysis.

The first technology is evolutionary graph clustering to discover blog communities. There are new challenges as traditional clustering techniques are applied to temporal data, such as blog data and streaming data, where the relation among data evolves with time. On one hand with long-term concept drifts, a naive approach based on aggregation will not give satisfactory cluster results. On the other hand, short-term variations are very often due to noise. Therefore clustering results should not change dramatically over short time and should exhibit temporal smoothness. We present two frameworks of incorporating temporal smoothness in evolutionary spectral clustering.

The second technology is information flow ranking to identify influential bloggers. People constantly influence each other in all facets of life, including the wisdom of crowd in the blogosphere. Information flows in a social network where individuals influence each other. We present two graph ranking algorithms that leverage information flow to identify who are the influential nodes and where information should flow to.

The third technology is temporal graph visualization to understand the bloggers dynamics. Our vision is to summarize the blogosphere as a social network of bloggers with wisdom. Discovering blog communities and ranking influential bloggers provide some insights. To observe the behaviors and dynamics, we present several visualization tools to facilitate researchers to observe patterns, including a demo of our blog summarization.