ACM Home Page
Please provide us with feedback. Feedback
Turning down the noise in the blogosphere
Full text MovMov (26:05),  PdfPdf (447 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 289-298  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Khalid El-Arini  Carnegie Mellon University, Pittsburgh, PA, USA
Gaurav Veda  Carnegie Mellon University, Pittsburgh, PA, USA
Dafna Shahaf  Carnegie Mellon University, Pittsburgh, PA, USA
Carlos Guestrin  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 48,   Downloads (12 Months): 146,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557056
What is a DOI?

ABSTRACT

In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere.

We define a simple and elegant notion of coverage and formalize it as a submodular optimization problem, for which we can efficiently compute a near-optimal solution. In addition, since people have varied interests, the ideal coverage algorithm should incorporate user preferences in order to tailor the selected posts to individual tastes. We define the problem of learning a personalized coverage function by providing an appropriate user-interaction model and formalizing an online learning framework for this task. We then provide a no-regret algorithm which can quickly learn a user's preferences from limited feedback.

We evaluate our coverage and personalization algorithms extensively over real blog data. Results from a user study show that our simple coverage algorithm does as well as most popular blog aggregation sites, including Google Blog Search, Yahoo! Buzz, and Digg. Furthermore, we demonstrate empirically that our algorithm can successfully adapt to user preferences. We believe that our technique, especially with personalization, can dramatically reduce information overload.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Blogpulse, http://blogpulse.com.
 
2
Blogscope, http://www.blogscope.net/.
 
3
Digg, http://digg.com.
 
4
Google Blog Search, http://blogsearch.google.com.
 
5
Spinn3r, http://spinn3r.com/.
 
6
Technorati, http://technorati.com.
 
7
Yahoo! Buzz, http://buzz.yahoo.com.
 
8
D. Agarwal, B.-C. Chen, P. Elango, R. Ramakrishnan, N. Motgi, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, 2008.
 
9
 
10
K. R. Canini, L. Shi, and T. L. Griffiths. Online inference of topics with latent Dirichlet allocation. In AISTATS, 2009.
11
 
12
 
13
H. Chen and D. Karger. Less is more. In SIGIR, 2006.
14
 
15
K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. Tech. Report CMU-ML-09-103, CMU, 2009.
 
16
T. Finin, A. Joshi, P. Kolari, A. Java, A. Kale, and A. Karandikar. The information ecology of social media and online communities. AI Magazine, 2008.
 
17
 
18
Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 2000.
 
19
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 2004.
 
20
 
21
M. Kinsley. How many blogs does the world need? TIME Magazine, 172(22), December 2008.
22
 
23
 
24
G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265--294, 1978
 
25
 
26
B. Smith. The hair's still perfect. Politico, April 16, 2007.
27
28

Collaborative Colleagues:
Khalid El-Arini: colleagues
Gaurav Veda: colleagues
Dafna Shahaf: colleagues
Carlos Guestrin: colleagues