|
ABSTRACT
In recent years, weblogs, or blogs for short, have become an important form of online content. The personal nature of blogs, online interactions between bloggers, and the temporal nature of blog entries, differentiate blogs from other kinds of Web content. Bloggers interact with each other by linking to each other's posts, thus forming online communities. Within these communities, bloggers engage in discussions of certain issues, through entries in their blogs. Since these discussions are often initiated in response to online or offline events, a discussion typically lasts for a limited time duration. We wish to extract such temporal discussions, or stories, occurring within blogger communities, based on some query keywords. We propose a Content-Community-Time model that can leverage the content of entries, their timestamps, and the community structure of the blogs, to automatically discover stories. Doing so also allows us to discover hot stories. We demonstrate the effectiveness of our model through several case studies using real-world data collected from the blogosphere.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
Blogger. www.blogger.com.
|
| |
5
|
Blogpulse. www.blogpulse.com.
|
 |
6
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
 |
7
|
Natalie Glance , Matthew Hurst , Kamal Nigam , Matthew Siegler , Robert Stockton , Takashi Tomokiyo, Deriving marketing intelligence from online discussion, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081919]
|
 |
8
|
|
| |
9
|
T. Hoffman. Probabalistic latent semantic analysis. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 1999.
|
| |
10
|
iBoogie. www.iboogie.com.
|
| |
11
|
K. Ishida. Extracting latent weblog communities: A partitioning algorithm for bipartite graphs. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.
|
| |
12
|
X. Jhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical Report, CMU-CALD-05-104, 2005.
|
| |
13
|
C. Kemp, T. L. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical Report, MIT CSAIL, 2004.
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
S. law, O. Jerzy, and S. Dawid. Lingo: Search results clustering algorithm based on singular value decomposition, 2004.
|
| |
18
|
LiveJournal. www.livejournal.com.
|
| |
19
|
Apache Lucene. lucene.apache.org.
|
| |
20
|
Michal Rosen-Zvi , Thomas Griffiths , Mark Steyvers , Padhraic Smyth, The author-topic model for authors and documents, Proceedings of the 20th conference on Uncertainty in artificial intelligence, p.487-494, July 07-11, 2004, Banff, Canada
|
 |
21
|
|
| |
22
|
A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical Report UM-CS-2004-096, 2004.
|
 |
23
|
|
| |
24
|
K. Nowicki and T. A. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 2001.
|
| |
25
|
Google Blog Search. blogsearch.google.com.
|
 |
26
|
Xiaodan Song , Ching-Yung Lin , Belle L. Tseng , Ming-Ting Sun, Modeling and predicting personal information dissemination behavior, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081925]
|
| |
27
|
Technorati. www.technorati.com.
|
| |
28
|
B. L. Tseng, J. Tatemura, and Y. Wu. Tomographic clustering to visualize blog communities as mountain views. In Proceedings of 2nd Annual Workshop on the Weblogging Ecosystem, 2005.
|
| |
29
|
Vivisimo. www.vivisimo.com.
|
 |
30
|
|
| |
31
|
|
 |
32
|
Hua-Jun Zeng , Qi-Cai He , Zheng Chen , Wei-Ying Ma , Jinwen Ma, Learning to cluster web search results, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009030]
|
CITED BY 5
|
|
|
|
|
Yun Chi , Shenghuo Zhu , Xiaodan Song , Junichi Tatemura , Belle L. Tseng, Structural and temporal analysis of the blogosphere through community factorization, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Dong Zhou , Mark Truran , Tim Brailsford , Helen Ashman , Amir Pourabdollah, Llama-b: automatic hyperlink authoring in the blogosphere, Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, June 19-21, 2008, Pittsburgh, PA, USA
|
|
|
|
|
|
Qiankun Zhao , Prasenjit Mitra , Bi Chen, Temporal and information flow based event detection from social text streams, Proceedings of the 22nd national conference on Artificial intelligence, p.1501-1506, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|