|
ABSTRACT
A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar documents, one or several particular aspects. This kind of task, called instance or aspectual retrieval, has been explored in several TREC Interactive Tracks. In this article, we propose in addition to the classification capacity of clustering techniques, the possibility of offering a indicative extract about the contents of several sources by means of multidocument summarization techniques. Two kinds of summaries are provided. The first one covers the similarities of each cluster of documents retrieved. The second one shows the particularities of each document with respect to the common topic in the cluster. The document multitopic structure has been used in order to determine similarities and differences of topics in the cluster of documents. The system is independent of document domain and genre. An evaluation of the proposed system with users proves significant improvements in effectiveness. The results of previous experiments that have compared clustering algorithms are also reported.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abraços, J. and Lopes, G. P. 1997. Statistical methods for retrieving most significant paragraphs in newspaper articles. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the 35th Meeting of the Association for Computational Linguistics, and the 8th Conference of the European Chapter of the Assocation for Computational Linguistics (Madrid, Spain). I. Mani and M. T. Maybury, Eds.
|
| |
2
|
Rie Kubota Ando , Branimir K. Boguraev , Roy J. Byrd , Mary S. Neff, Multi-document summarization by visualizing topical content, NAACL-ANLP 2000 Workshop on Automatic summarization, p.79-98, April 30-30, 2000, Seattle, Washington
[doi> 10.3115/1117575.1117584]
|
| |
3
|
Baxendale, P. B. 1958. Man-made index for technical literature---An experiment. IBM J. Res. Develop. 2, 4, 354--361.
|
| |
4
|
|
| |
5
|
Carey, M., Kriwaczek, F., and Rüger, S. 2000. A visualization interface for document searching and browsing. In Proceedings of CIKM 2000 Workshop on New Paradigms in Information Visualization and Manipulation (Washington, D.C.).
|
 |
6
|
|
| |
7
|
Fuller, M., Kaszkiel, M., Ng, C., Wu, M., Zobel, J., Kim, D., Robertson, J., and Wilkinson, R. 1998. Ad hoc, speech, and interactive tracks at MDS/CSIRO. In Proceedings of the 7th Text REtrieval Conference (TREC-7) (Gaithersburg, Md.). 465--474.
|
 |
8
|
Jade Goldstein , Vibhu Mittal , Jaime Carbonell , Jamie Callan, Creating and evaluating multi-document sentence extract summaries, Proceedings of the ninth international conference on Information and knowledge management, p.165-172, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354815]
|
| |
9
|
|
 |
10
|
|
| |
11
|
Hersh, W. and Over, P. 1999. TREC-8 interactive report. In Proceedings of the 8th Text REtrieval Conference (TREC-8) (Gaithersburg, Md.). 57--64.
|
| |
12
|
Jardine, N. and van Rijsbergen, C. J. 1971. The use of hierarchic clustering in information retrieval. Inf. Stor. Ret. 7, 217--240.
|
| |
13
|
Kan, M., McKeown, K. R., and Klavans, J. L. 2001. Domain-specific informative and indicative summarization for information retrieval. In Proceedings of the Workshop on Text Summarization, 24th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (New Orleans, La.). ACM, New York.
|
| |
14
|
Karypis, G. 2002. Cluto: A Software Package for Clustering High Dimensional Datasets. Release 1.5. Department of Computer Science, University of Minnesota.
|
| |
15
|
Krishnaiah, P. R. and Kanal, L. 1982. Classification, Pattern Recognition and Reduction in Dimensionality: Handbook of Statistics. Vol. 2. North-Holland Publishing Company, Amsterdam, The Netherlands.
|
 |
16
|
|
| |
17
|
Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM J. Res. Develop. 2, 2, 159--165.
|
| |
18
|
|
| |
19
|
Mani, I. 2001. Automatic Summarization. John Benjamins Publishing Company, Amsterdam/Philadephia.
|
| |
20
|
Vibhu Mittal , Mark Kantrowitz , Jade Goldstein , Jaime Carbonell, Selecting text spans for document summaries: heuristics and metrics, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.467-473, July 18-22, 1999, Orlando, Florida, United States
|
| |
21
|
|
| |
22
|
Over, P. 1997. TREC-6 interactive report. In Proceedings of the Sixth Text REtrieval Conference (TREC-6) (Gaithersburg, Md.). 73--82.
|
| |
23
|
Over, P. 1998. TREC-7 interactive track report. In Proceedings of the Seventh Text REtrieval Conference (TREC-7) (Gaithersburg, Md.). 65--72.
|
| |
24
|
|
| |
25
|
Dragomir R. Radev , Hongyan Jing , Malgorzata Budzikowska, Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies, NAACL-ANLP 2000 Workshop on Automatic summarization, p.21-30, April 30-30, 2000, Seattle, Washington
[doi> 10.3115/1117575.1117578]
|
| |
26
|
|
| |
27
|
Rüger, S. and Gauch, S. E. 2000. Feature reduction for document clustering and classification. Tech. Rep. DTR 2000/8. Department of Computing, Imperial College, London, England.
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Skorochod'ko, E. F. 1972. Adaptive method of automatic abstracting and indexing. In Information Processing 71: Proceedings of the IFIP Congress 71, C. Freiman, Ed. North-Holland, Amsterdam, The Netherlands, 1179--1182.
|
| |
34
|
|
| |
35
|
Steinbach, M., Karypis, G., and Kumar, V. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining.
|
 |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
|
| |
40
|
|
 |
41
|
|
| |
42
|
Zhao, Y. and Karypis, G. 2001. Criterion functions for document clustering: Experiments and analysis. Tech. Rep. 01-40, Department of Computer Science, University of Minnesota.
|
CITED BY 5
|
|
|
|
Daniel M. Dunlavy , Dianne P. O'Leary , John M. Conroy , Judith D. Schlesinger, QCS: A system for querying, clustering and summarizing documents, Information Processing and Management: an International Journal, v.43 n.6, p.1588-1605, November, 2007
|
|
|
|
|
|
|
|
|
|
REVIEWS
"Bei Yu : Reviewer"
Clustering retrieved documents is a typical post-retrieval processing technique used to present an organized result set, not simply a ranked list, to the user, in order to reduce the cognitive burden of going through a large number of returned res
more...
"Ian Ruthven : Reviewer"
Simultaneously accessing large numbers of text documents is an activity that is not well supported by current search engine interfaces. Many solutions have been explored that employ some form of clustering, or document summarization, to facilitate
more...
Peer to Peer - Readers of this Article have also read:
-
Open signaling for ATM, internet and mobile networks (OPENSIG'98)
ACM SIGCOMM Computer Communication Review
29, 1
Andrew T. Campbell
, Irene Katzela
, Kazuho Miki
, John Vicente
-
Constructing reality
Proceedings of the 11th annual international conference on Systems documentation
Douglas A. Powell
, Norman R. Ball
, Mansel W. Griffiths
-
Active bridging
ACM SIGCOMM Computer Communication Review
27, 4
D. Scott Alexander
, Marianne Shaw
, Scott M. Nettles
, Jonathan M. Smith
-
Active electronic mail
Proceedings of the 2002 ACM symposium on Applied computing
S. Karnouskos
, A. Vasilakos
-
Object-oriented database management system for process control systems—development and evaluation
Proceedings of the 1999 ACM symposium on Applied computing
Ryuji Wakizono
, Toshikazu Kawamura
, Takehiko Tsuchiya
, Takahiro Hatanaka
, Tatsuji Tanaka
|