| Summarizing large document sets using concept-based clustering |
| Full text |
Pdf
(640 KB)
|
Source
|
Human Language Technology Conference
archive
Proceedings of the second international conference on Human Language Technology Research
table of contents
San Diego, California
Pages: 235 - 240
Year of Publication: 2002
|
|
Authors
|
|
Hilda Hardy
|
University at Albany, Albany, NY
|
|
Nobuyuki Shimizu
|
University at Albany, Albany, NY
|
|
Tomek Strzalkowski
|
University at Albany, Albany, NY
|
|
Liu Ting
|
University at Albany, Albany, NY
|
|
G. Bowden Wise
|
GE Global Research Center, Niskayuna, NY
|
|
Xinyang Zhang
|
University at Albany, Albany, NY
|
|
| Sponsors |
|
| Publisher |
Morgan Kaufmann Publishers Inc.
San Francisco, CA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 28, Citation Count: 2
|
|
|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
This paper describes our multi-document summarizer XDoX designed to summarize large sets of documents (50--500). These documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX identifies the most salient or often-repeated themes within the set and composes an extraction summary reflecting these main themes. The summarizer uses a unique n-gram scoring method to give greater importance to clusters of passages that have significant common phrases. Our methods are robust, topic-independent, and easily extensible to multilingual applications. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Fellbaum, C. (ed.). WordNet -- An Electronic Lexical Database. MIT Press, 1998.
|
| |
3
|
Firmin, T., and Chrzanowski, M. J. An Evaluation of Automatic Text Summarization Systems. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999.
|
| |
4
|
Hatzivassiloglou, V., Klavans, J. L., Holcombe, M. L., Barzilay, R., Kan, M., and McKeown, K. R. SIMFINDER: A Flexible Clustering Tool for Summarization. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 41--49.
|
| |
5
|
|
| |
6
|
Kraaij, W., Spitters, M., and van der Heijden, M. Combining a mixture language model and Naïve Bayes for multi-document summarization. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 95--103.
|
| |
7
|
Lin, C. and Hovy, E. NEATS: A Multidocument Summarizer. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 131--134.
|
| |
8
|
Marcu, D. Discourse-Based Summarization in DUC-2001. In SIGIR 2001 Workshop on Text Summarization (New Orleans, LA), 109--116.
|
 |
9
|
|
 |
10
|
|
| |
11
|
Mitra, M., Singhal, A., and Buckley, C. Automatic text summarization by paragraph extraction. In Proceedings of the ACL' 97/EACL' 97 Workshop on Intelligent Scalable Text Summarization (Madrid, Spain, 1997).
|
| |
12
|
Radev, D. R., Fan, W., and Zhang, Z. WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In NAACL 2001 Workshop on Automatic Summarization (Pittsburgh, PA), 79--88.
|
| |
13
|
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., and Gatford, M. Okapi at TREC-3. In Harman, D. (ed.), The Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500--225, 1995, 219--230.
|
 |
14
|
|
| |
15
|
Stein, G., Strzalkowski, T., and Wise, B. Interactive, Text-Based Summarization of Multiple Documents. Computational Intelligence 16, 4 (2000), 606--613.
|
| |
16
|
Strzalkowski, T., Stein, G., Wang, J., and Wise, B. A Robust, Practical Text Summarizer. In I. Mani and M. Maybury (eds.), Advances in Automatic Text Summarization. MIT Press, 1999, 137--154.
|
| |
17
|
|
CITED BY 2
|
|
|
|
|
Sebastian de la Chica , Faisal Ahmad , James H. Martin , Tamara Sumner, Pedagogically useful extractive summaries for science education, Proceedings of the 22nd International Conference on Computational Linguistics, p.177-184, August 18-22, 2008, Manchester, United Kingdom
|
|