|
ABSTRACT
With the emergence of enormous amount of online news, it is desirable to construct text mining methods that can extract, compare and highlight similarities of them. In this paper, we explore the research issue and methodology of correlated summarization for a pair of news articles. The algorithm aligns the (sub)topics of the two news articles and summarizes their correlation by sentence extraction. A pair of news articles are modelled with a weighted bipartite graph. A mutual reinforcement principle is applied to identify a dense subgraph of the weighted bipartite graph. Sentences corresponding to the subgraph are correlated well in textual content and convey the dominant shared topic of the pair of news articles. As a further enhancement for lengthy articles, a k-way bi-clustering algorithm can first be used to partition the bipartite graph into several clusters, each containing sentences from the two news reports. These clusters correspond to shared subtopics, and the above mutual reinforcement principle can then be applied to extract topic sentences within each subtopic group.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Cooley, J. Srivastava, and B. Mobasher. Web mining: Information and pattern discovery on the world wide web. pages 558--567, 1997.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
M. Dixon. An overview of document mining technology. http://citeseer.ist.psu.edu/dixon97overview.html, 1997.
|
| |
7
|
L. Ertoz, M. Steinbach, and V. Kumar. Finding topics in collections of documents: A shared nearest neighbor approach. In Text Mine '01, Workshop on Text Mining, First SIAM International Conference on Data Mining, Chicago, IL, 2001.
|
| |
8
|
P. Gawrysiak. Using data mining methodology for text retrieval. In Proceedings of International Information Science and Education Conference, Gdansk, Poland, 1999.
|
| |
9
|
Jade Goldstein , Vibhu Mittal , Jaime Carbonell , Mark Kantrowitz, Multi-document summarization by sentence extraction, NAACL-ANLP 2000 Workshop on Automatic summarization, p.40-48, April 30-30, 2000, Seattle, Washington
[doi> 10.3115/1117575.1117580]
|
| |
10
|
M. Gu, H. Zha, C. Ding, X. He, and H. Simon. Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. Technical Report CSE-01-007, Department of Computer Science and Engineering, the Pennsylvania State University, 2001.
|
 |
11
|
Taher H. Haveliwala , Aristides Gionis , Dan Klein , Piotr Indyk, Evaluating strategies for similarity search on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511502]
|
 |
12
|
|
 |
13
|
|
| |
14
|
S. Lawrence and C. Giles. Accessibility of information on the web. Nature, 400:107--109, 1999.
|
| |
15
|
I. Mani and E. Bloedorn. Multi-document summarization by graph search and matching. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), pages 622--628, Providence, RI, 1997.
|
| |
16
|
|
| |
17
|
|
| |
18
|
D. Marcu. Automatic abstracting. Encyclopedia of Library and Information Science, pages 245--256, 2003.
|
| |
19
|
J. L. Neto, A. D. Santos, C. A. A. Kaestner, and A. A. Freitas. Document clustering and text summarization. In 4th International Conference on Practical Applications of Knowledge Discovery and Data Ming, London, 2000.
|
| |
20
|
|
| |
21
|
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
C. Wayne. Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In Proceedings of Language Resources and Evaluation Conference (LREC), pages 1487--1494, 2000.
|
|