ACM Home Page
Please provide us with feedback. Feedback
Efficient summarization-aware search for online news articles
Full text PdfPdf (8.13 MB)
Source
International Conference on Digital Libraries archive
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries table of contents
Vancouver, BC, Canada
SESSION: Information extraction 1 table of contents
Pages: 63 - 72  
Year of Publication: 2007
ISBN:978-1-59593-644-8
Authors
Wisam Dakka  Columbia University, New York City, NY
Luis Gravano  Columbia University, New York City, NY
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 71,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1255175.1255187
What is a DOI?

ABSTRACT

News portals gather and organize news articles published daily on the Internet. Typically, news articles are clustered into 'events' and each cluster is displayed with a short description of its contents. A particularly interesting choice for describing the contents of a cluster is a machine-generated multi-document summary of the articles in the cluster. Such summaries are informative and help news readers to identify and explore only clusters of interest. Naturally, multi-document clusters and summaries are also valuable to help users navigate the results of keyword-search queries. Unfortunately, current document summarizers are still slow; as a result, search strategies that define document clusters and their multi-document summaries online, in a query-specific manner, are prohibitively expensive. In contrast, search strategies that only return offline, query-independent document clusters are efficient, but might return clusters whose (query-independent) summaries are of little relevance to the queries. In this paper, we present an efficient Hybrid search strategy to address the limitations of fully online and fully offline summarization-aware search approaches. Extensive experiments involving user relevance judgments and real news articles show that the quality of our Hybrid results is high, and that these results are computed in substantially less time than with the fully online strategy. We have implemented our strategy and made it available on the Newsblaster news summarization system, which crawls and summarizes news articles from a variety of web sources on a daily basis.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
W. W. Cohen. Fast effective rule induction. In ICML'95, 1995.
 
2
J. L. Fleiss, B. Levin, M. C. Paik, J. Fleiss, and B. Levin.Statistical Methods for Rates Proportions. Wiley-Interscience, 2003.
3
 
4
5
 
6
 
7
N. Jardine and C. J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.
 
8
 
9
 
10
D. Koller and M. Sahami. Toward optimal feature selection. In ICML'96, 1996.
 
11
A. Leuski and J. Allan. Improving interactive retrieval by combining ranked list and clustering. In RIAO'00, 2000.
 
12
C.-Y. Lin and E. Hovy. Automated multi-document summarization in NeATS. In HLT'02, 2002.
 
13
K. C. Litkowski. Summarization experiments in DUC 2004. In DUC'04, 2001.
14
 
15
J. P. Marques De Sá. Applied Statistics. Springer Verlag, 2003.
 
16
K. R. McKeown et al. Tracking and summarizing news on a daily basis with Columbia's Newsblaster. In HLT'02, 2002.
 
17
 
18
S. E. Robertson. Overview of the Okapi projects. Journal of Documentation, 53(1):3--7, 1997.
 
19
 
20
 
21
 
22
23
 
24
25

Collaborative Colleagues:
Wisam Dakka: colleagues
Luis Gravano: colleagues