| Turning down the noise in the blogosphere |
| Full text |
Mov
(26:05),
Pdf
(447 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 289-298
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Khalid El-Arini
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Gaurav Veda
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Dafna Shahaf
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Carlos Guestrin
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 48, Downloads (12 Months): 146, Citation Count: 0
|
|
|
ABSTRACT
In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere. We define a simple and elegant notion of coverage and formalize it as a submodular optimization problem, for which we can efficiently compute a near-optimal solution. In addition, since people have varied interests, the ideal coverage algorithm should incorporate user preferences in order to tailor the selected posts to individual tastes. We define the problem of learning a personalized coverage function by providing an appropriate user-interaction model and formalizing an online learning framework for this task. We then provide a no-regret algorithm which can quickly learn a user's preferences from limited feedback. We evaluate our coverage and personalization algorithms extensively over real blog data. Results from a user study show that our simple coverage algorithm does as well as most popular blog aggregation sites, including Google Blog Search, Yahoo! Buzz, and Digg. Furthermore, we demonstrate empirically that our algorithm can successfully adapt to user preferences. We believe that our technique, especially with personalization, can dramatically reduce information overload.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Blogpulse, http://blogpulse.com.
|
| |
2
|
Blogscope, http://www.blogscope.net/.
|
| |
3
|
Digg, http://digg.com.
|
| |
4
|
Google Blog Search, http://blogsearch.google.com.
|
| |
5
|
Spinn3r, http://spinn3r.com/.
|
| |
6
|
Technorati, http://technorati.com.
|
| |
7
|
Yahoo! Buzz, http://buzz.yahoo.com.
|
| |
8
|
D. Agarwal, B.-C. Chen, P. Elango, R. Ramakrishnan, N. Motgi, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, 2008.
|
| |
9
|
|
| |
10
|
K. R. Canini, L. Shi, and T. L. Griffiths. Online inference of topics with latent Dirichlet allocation. In AISTATS, 2009.
|
 |
11
|
|
| |
12
|
|
| |
13
|
H. Chen and D. Karger. Less is more. In SIGIR, 2006.
|
 |
14
|
Abhinandan S. Das , Mayur Datar , Ashutosh Garg , Shyam Rajaram, Google news personalization: scalable online collaborative filtering, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242610]
|
| |
15
|
K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. Tech. Report CMU-ML-09-103, CMU, 2009.
|
| |
16
|
T. Finin, A. Joshi, P. Kolari, A. Java, A. Kale, and A. Karandikar. The information ecology of social media and online communities. AI Magazine, 2008.
|
| |
17
|
|
| |
18
|
Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 2000.
|
| |
19
|
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 2004.
|
| |
20
|
|
| |
21
|
M. Kinsley. How many blogs does the world need? TIME Magazine, 172(22), December 2008.
|
 |
22
|
Jure Leskovec , Andreas Krause , Carlos Guestrin , Christos Faloutsos , Jeanne VanBriesen , Natalie Glance, Cost-effective outbreak detection in networks, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281239]
|
| |
23
|
|
| |
24
|
G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978
|
| |
25
|
|
| |
26
|
B. Smith. The hair's still perfect. Politico, April 16, 2007.
|
 |
27
|
|
 |
28
|
Benyu Zhang , Hua Li , Yi Liu , Lei Ji , Wensi Xi , Weiguo Fan , Zheng Chen , Wei-Ying Ma, Improving web search results using affinity graph, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076120]
|
|