| Unweaving a web of documents |
| Full text |
Pdf
(644 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
POSTER SESSION: Research track poster
table of contents
Pages: 574 - 579
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 66, Citation Count: 2
|
|
|
ABSTRACT
We develop an algorithmic framework to decompose a collection of time-stamped text documents into semantically coherent threads. Our formulation leads to a graph decomposition problem on directed acyclic graphs, for which we obtain three algorithms --- an exact algorithm that is based on minimum cost flow and two more efficient algorithms based on maximum matching and dynamic programming that solve specific versions of the graph decomposition problem. Applications of our algorithms include superior summarization of news search results, improved browsing paradigms for large collections of text-intensive corpora, and integration of time-stamped documents from a variety of sources. Experimental results based on over 250,000 news articles from a major newspaper over a period of four years demonstrate that our algorithms efficiently identify robust threads of varying lengths and time-spans.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. DARPA Broadcast News Transcription and Understanding Workshop, 1998.
|
 |
5
|
|
| |
6
|
|
| |
7
|
T. Dalamagas. NHS: A tool for the automatic construction of news hypertext, 1998.
|
| |
8
|
T. Dalamagas and M. D. Dunlop. Automatic construction of news hypertext. In HIM, pages 265--278, 1997.
|
 |
9
|
|
 |
10
|
Gary William Flake , Steve Lawrence , C. Lee Giles, Efficient identification of Web communities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.150-160, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347121]
|
| |
11
|
M. Garey and D. Johnson. Computers and Intractability. Freeman, 1979.
|
 |
12
|
|
| |
13
|
S. J. Green. Automatically generating hypertext in newspaper articles by computing semantic relatedness. In Joint Conf. on New Methods in Lang. and Comp. Nat. Lang. Learning, pages 101--110, 1998.
|
 |
14
|
|
| |
15
|
J. E. Hopcroft and R. M. Karp. An n5/2 algorithm for maximum matchings in bipartite graphs. SICOMP, 24(2):225--231, 1973.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
J.-J. Pan. Path Partition and Its Variation in Graphs. PhD thesis, National Chiao Tung U., 2004.
|
| |
20
|
A. F. Smeaton and P. J. Morrissey. Experiments on the automatic construction of hypertexts from texts. The New Review of Hypermedia and Multimedia, 1:23--39, 1995.
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
Naohiko Uramoto , Koichi Takeda, A method for relating multiple newspaper articles by using graphs, and its application to Webcasting, Proceedings of the 36th annual meeting on Association for Computational Linguistics, p.1307-1313, August 10-14, 1998, Montreal, Quebec, Canada
|
 |
25
|
|
CITED BY 2
|
|
Deept Kumar , Naren Ramakrishnan , Richard F. Helm , Malcolm Potts, Algorithms for storytelling, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|