ACM Home Page
Please provide us with feedback. Feedback
Unweaving a web of documents
Full text PdfPdf (644 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
POSTER SESSION: Research track poster table of contents
Pages: 574 - 579  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
R. Guha  Google, Inc., Mountain View, CA
Ravi Kumar  IBM Almaden Research Center, San Jose, CA
D. Sivakumar  IBM Almaden Research Center, San Jose, CA
Ravi Sundaram  Northeastern University, Boston, MA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 66,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081939
What is a DOI?

ABSTRACT

We develop an algorithmic framework to decompose a collection of time-stamped text documents into semantically coherent threads. Our formulation leads to a graph decomposition problem on directed acyclic graphs, for which we obtain three algorithms --- an exact algorithm that is based on minimum cost flow and two more efficient algorithms based on maximum matching and dynamic programming that solve specific versions of the graph decomposition problem. Applications of our algorithms include superior summarization of news search results, improved browsing paradigms for large collections of text-intensive corpora, and integration of time-stamped documents from a variety of sources. Experimental results based on over 250,000 news articles from a major newspaper over a period of four years demonstrate that our algorithms efficiently identify robust threads of varying lengths and time-spans.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. DARPA Broadcast News Transcription and Understanding Workshop, 1998.
5
 
6
 
7
T. Dalamagas. NHS: A tool for the automatic construction of news hypertext, 1998.
 
8
T. Dalamagas and M. D. Dunlop. Automatic construction of news hypertext. In HIM, pages 265--278, 1997.
9
10
 
11
M. Garey and D. Johnson. Computers and Intractability. Freeman, 1979.
12
 
13
S. J. Green. Automatically generating hypertext in newspaper articles by computing semantic relatedness. In Joint Conf. on New Methods in Lang. and Comp. Nat. Lang. Learning, pages 101--110, 1998.
14
 
15
J. E. Hopcroft and R. M. Karp. An n5/2 algorithm for maximum matchings in bipartite graphs. SICOMP, 24(2):225--231, 1973.
16
17
 
18
 
19
J.-J. Pan. Path Partition and Its Variation in Graphs. PhD thesis, National Chiao Tung U., 2004.
 
20
A. F. Smeaton and P. J. Morrissey. Experiments on the automatic construction of hypertexts from texts. The New Review of Hypermedia and Multimedia, 1:23--39, 1995.
21
22
 
23
 
24
25


Collaborative Colleagues:
R. Guha: colleagues
Ravi Kumar: colleagues
D. Sivakumar: colleagues
Ravi Sundaram: colleagues