ACM Home Page
Please provide us with feedback. Feedback
Multidocument summarization: An added value to clustering in interactive retrieval
Full text PdfPdf (200 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 22 ,  Issue 2  (April 2004) table of contents
Pages: 215 - 241  
Year of Publication: 2004
ISSN:1046-8188
Authors
Manuel J. Maña-López  Universidad de Vigo, Huelva, Spain
Manuel De Buenaga  Universidad Europea de Madrid, Madrid, Spain
José M. Gómez-Hidalgo  Universidad Europea de Madrid, Madrid, Spain
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 146,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   reviews   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/984321.984323
What is a DOI?

ABSTRACT

A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar documents, one or several particular aspects. This kind of task, called instance or aspectual retrieval, has been explored in several TREC Interactive Tracks. In this article, we propose in addition to the classification capacity of clustering techniques, the possibility of offering a indicative extract about the contents of several sources by means of multidocument summarization techniques. Two kinds of summaries are provided. The first one covers the similarities of each cluster of documents retrieved. The second one shows the particularities of each document with respect to the common topic in the cluster. The document multitopic structure has been used in order to determine similarities and differences of topics in the cluster of documents. The system is independent of document domain and genre. An evaluation of the proposed system with users proves significant improvements in effectiveness. The results of previous experiments that have compared clustering algorithms are also reported.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abraços, J. and Lopes, G. P. 1997. Statistical methods for retrieving most significant paragraphs in newspaper articles. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the 35th Meeting of the Association for Computational Linguistics, and the 8th Conference of the European Chapter of the Assocation for Computational Linguistics (Madrid, Spain). I. Mani and M. T. Maybury, Eds.
 
2
 
3
Baxendale, P. B. 1958. Man-made index for technical literature---An experiment. IBM J. Res. Develop. 2, 4, 354--361.
 
4
 
5
Carey, M., Kriwaczek, F., and Rüger, S. 2000. A visualization interface for document searching and browsing. In Proceedings of CIKM 2000 Workshop on New Paradigms in Information Visualization and Manipulation (Washington, D.C.).
6
 
7
Fuller, M., Kaszkiel, M., Ng, C., Wu, M., Zobel, J., Kim, D., Robertson, J., and Wilkinson, R. 1998. Ad hoc, speech, and interactive tracks at MDS/CSIRO. In Proceedings of the 7th Text REtrieval Conference (TREC-7) (Gaithersburg, Md.). 465--474.
8
 
9
10
 
11
Hersh, W. and Over, P. 1999. TREC-8 interactive report. In Proceedings of the 8th Text REtrieval Conference (TREC-8) (Gaithersburg, Md.). 57--64.
 
12
Jardine, N. and van Rijsbergen, C. J. 1971. The use of hierarchic clustering in information retrieval. Inf. Stor. Ret. 7, 217--240.
 
13
Kan, M., McKeown, K. R., and Klavans, J. L. 2001. Domain-specific informative and indicative summarization for information retrieval. In Proceedings of the Workshop on Text Summarization, 24th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (New Orleans, La.). ACM, New York.
 
14
Karypis, G. 2002. Cluto: A Software Package for Clustering High Dimensional Datasets. Release 1.5. Department of Computer Science, University of Minnesota.
 
15
Krishnaiah, P. R. and Kanal, L. 1982. Classification, Pattern Recognition and Reduction in Dimensionality: Handbook of Statistics. Vol. 2. North-Holland Publishing Company, Amsterdam, The Netherlands.
16
 
17
Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM J. Res. Develop. 2, 2, 159--165.
 
18
 
19
Mani, I. 2001. Automatic Summarization. John Benjamins Publishing Company, Amsterdam/Philadephia.
 
20
 
21
 
22
Over, P. 1997. TREC-6 interactive report. In Proceedings of the Sixth Text REtrieval Conference (TREC-6) (Gaithersburg, Md.). 73--82.
 
23
Over, P. 1998. TREC-7 interactive track report. In Proceedings of the Seventh Text REtrieval Conference (TREC-7) (Gaithersburg, Md.). 65--72.
 
24
 
25
 
26
 
27
Rüger, S. and Gauch, S. E. 2000. Feature reduction for document clustering and classification. Tech. Rep. DTR 2000/8. Department of Computing, Imperial College, London, England.
 
28
 
29
 
30
 
31
 
32
 
33
Skorochod'ko, E. F. 1972. Adaptive method of automatic abstracting and indexing. In Information Processing 71: Proceedings of the IFIP Congress 71, C. Freiman, Ed. North-Holland, Amsterdam, The Netherlands, 1179--1182.
 
34
 
35
Steinbach, M., Karypis, G., and Kumar, V. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining.
36
 
37
 
38
 
39
 
40
41
 
42
Zhao, Y. and Karypis, G. 2001. Criterion functions for document clustering: Experiments and analysis. Tech. Rep. 01-40, Department of Computer Science, University of Minnesota.



REVIEWS

"Bei Yu : Reviewer"

Clustering retrieved documents is a typical post-retrieval processing technique used to present an organized result set, not simply a ranked list, to the user, in order to reduce the cognitive burden of going through a large number of returned res  more...


"Ian Ruthven : Reviewer"

Simultaneously accessing large numbers of text documents is an activity that is not well supported by current search engine interfaces. Many solutions have been explored that employ some form of clustering, or document summarization, to facilitate  more...

Collaborative Colleagues:
Manuel J. Maña-López: colleagues
Manuel De Buenaga: colleagues
José M. Gómez-Hidalgo: colleagues

Peer to Peer - Readers of this Article have also read: