ACM Home Page
Please provide us with feedback. Feedback
Time-dependent event hierarchy construction
Full text PdfPdf (1.07 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 300 - 309  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Gabriel Pui Cheong Fung  CUHK, HK
Jeffrey Xu Yu  CUHK, HK
Huan Liu  Arizona State University
Philip S. Yu  IBM T. J. Watson Research Center
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 204,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281227
What is a DOI?

ABSTRACT

In this paper, an algorithm called Time Driven Documents-partition (TDD) is proposed to construct an event hierarchy in a text corpus based on a given query. Specifically, assume that a query contains only one feature - Election. Election is directly related to the events such as 2006 US Midterm Elections Campaign, 2004 US Presidential Election Campaign and 2004 Taiwan Presidential Election Campaign, where these events may further be divided into several smaller events (e.g. the 2006 US Midterm Elections Campaign can be broken down into events such as campaign for vote, election results and the resignation of Donald H. Rumsfeld). As such, an event hierarchy is resulted. Our proposed algorithm, TDD, tackles the problem by three major steps: (1)Identify the features that are related to the query according to both the timestamps and the contents of the documents. The features identified are regarded as bursty features; (2) Extract the documents that are highly related to the bursty features based on time; (3) Partition the extracted documents to form events and organize them in a hierarchicalstructure. To the best of our knowledge, there is little works targeting for constructing a feature-based event hierarchy for a text corpus. Practically, event hierarchies can assist us to efficiently locate our target information in a text corpus easily. Again, assume that Election is used for a query. Without an event hierarchy, it is very difficult to identify what are the major events related to it, when do these events happened, as well as the features and the news articles that are related to each of these events. We have archived two-year news articles to evaluate the feasibility of TDD. The encouraging results indicated that TDD is practically sound and highly effective.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
M. Connell, A. Feng, G. Kumaran, H. Raghavan, C. Shah, and J. Allan. UMass at TDT 2004. In 2004 Topic Detection and Tracking Workshop (TDT'04), Gaithersburg, Maryland, USA, 2004.
 
5
 
6
 
7
 
8
9
10
 
11
12
13
 
14
 
15
J. B. Lovins. Development of a stemming algorithm. Mechanical Traqnslation and Computational Linguistics, 11:22--31, 1968.
16
17
18
 
19
D. C. Montogomery and G. C. Runger. Applied Statistics and Probability for Engineers. John Wiley & Sons, Inc., second edition, 1999.
20
 
21
 
22
H. J. Peat and P. Willett. The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science (JASIS), 41(4):378--383, 1991.
23
 
24
 
25
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science (JASIS), 41(4):288--297, 1990.
26
27
 
28
M. Spitters and W. Kraaij. TNO at TDT2001: Language model-based topic detection. In 2001 Topic Detection and Tracking Workshop (TDT'01), Gaithersburg, Maryland, USA, 2001.
 
29
M. Steinbach, G. Karypis, and V. Kumar. A Comparison of Document Clustering Techniques. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00) Workshop on Text Mining, 2000.
30
31
 
32
D. Trieschnigg and W. Kraaij. Hierarchical topic detection in large digital news archives. In Proceedings of the 5th Dutch Belgian Information Retrieval workshop, pages 55--62, Utrecht, the Netherlands, 2005.
33
 
34
E. Wiener, J. O. Pedersen, and A. S. Weigend. A neural network approach to topic spotting. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), pages 317--332, Las Vegas, USA, 1994.
 
35
36
 
37
38


Collaborative Colleagues:
Gabriel Pui Cheong Fung: colleagues
Jeffrey Xu Yu: colleagues
Huan Liu: colleagues
Philip S. Yu: colleagues