ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Summarizing large document sets using concept-based clustering
Full text PdfPdf (640 KB)
Source
Human Language Technology Conference archive
Proceedings of the second international conference on Human Language Technology Research table of contents
San Diego, California
Pages: 235 - 240  
Year of Publication: 2002
Authors
Hilda Hardy  University at Albany, Albany, NY
Nobuyuki Shimizu  University at Albany, Albany, NY
Tomek Strzalkowski  University at Albany, Albany, NY
Liu Ting  University at Albany, Albany, NY
G. Bowden Wise  GE Global Research Center, Niskayuna, NY
Xinyang Zhang  University at Albany, Albany, NY
Sponsors
: Defense Advanced Research Project Agency
ACL : Association for Computational Linguistics
NSF : National Science Foundation
Publisher
Morgan Kaufmann Publishers Inc.  San Francisco, CA, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 28,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

Warning: The download time has expired please click on the item to try again.


ABSTRACT

This paper describes our multi-document summarizer XDoX designed to summarize large sets of documents (50--500). These documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX identifies the most salient or often-repeated themes within the set and composes an extraction summary reflecting these main themes. The summarizer uses a unique n-gram scoring method to give greater importance to clusters of passages that have significant common phrases. Our methods are robust, topic-independent, and easily extensible to multilingual applications. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Fellbaum, C. (ed.). WordNet -- An Electronic Lexical Database. MIT Press, 1998.
 
3
Firmin, T., and Chrzanowski, M. J. An Evaluation of Automatic Text Summarization Systems. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999.
 
4
Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., Kan, M., and McKeown, K. R. SIMFINDER: A Flexible Clustering Tool for Summarization. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 41--49.
 
5
 
6
Kraaij, W., Spitters, M., and van der Heijden, M. Combining a mixture language model and Naïve Bayes for multi-document summarization. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 95--103.
 
7
Lin, C. and Hovy, E. NEATS: A Multidocument Summarizer. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 131--134.
 
8
Marcu, D. Discourse-Based Summarization in DUC-2001. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 109--116.
9
10
 
11
Mitra, M., Singhal, A., and Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL' 97/EACL' 97 Workshop on Intelligent Scalable Text Summarization (Madrid, Spain, 1997).
 
12
Radev, D. R., Fan, W., and Zhang, Z. WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 79--88.
 
13
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. Okapi at TREC-3. In Harman, D. (ed.), The Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500--225, 1995, 219--230.
14
 
15
Stein, G., Strzalkowski, T., and Wise, B. Interactive, Text-Based Summarization of Multiple Documents. Computational Intelligence 16, 4 (2000), 606--613.
 
16
Strzalkowski, T., Stein, G., Wang, J., and Wise, B. A Robust, Practical Text Summarizer. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999, 137--154.
 
17

Collaborative Colleagues:
Hilda Hardy: colleagues
Nobuyuki Shimizu: colleagues
Tomek Strzalkowski: colleagues
Liu Ting: colleagues
G. Bowden Wise: colleagues
Xinyang Zhang: colleagues