|
ABSTRACT
In this paper, an algorithm called Time Driven Documents-partition (TDD) is proposed to construct an event hierarchy in a text corpus based on a given query. Specifically, assume that a query contains only one feature - Election. Election is directly related to the events such as 2006 US Midterm Elections Campaign, 2004 US Presidential Election Campaign and 2004 Taiwan Presidential Election Campaign, where these events may further be divided into several smaller events (e.g. the 2006 US Midterm Elections Campaign can be broken down into events such as campaign for vote, election results and the resignation of Donald H. Rumsfeld). As such, an event hierarchy is resulted. Our proposed algorithm, TDD, tackles the problem by three major steps: (1)Identify the features that are related to the query according to both the timestamps and the contents of the documents. The features identified are regarded as bursty features; (2) Extract the documents that are highly related to the bursty features based on time; (3) Partition the extracted documents to form events and organize them in a hierarchicalstructure. To the best of our knowledge, there is little works targeting for constructing a feature-based event hierarchy for a text corpus. Practically, event hierarchies can assist us to efficiently locate our target information in a text corpus easily. Again, assume that Election is used for a query. Without an event hierarchy, it is very difficult to identify what are the major events related to it, when do these events happened, as well as the features and the news articles that are related to each of these events. We have archived two-year news articles to evaluate the feasibility of TDD. The encouraging results indicated that TDD is practically sound and highly effective.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
M. Connell, A. Feng, G. Kumaran, H. Raghavan, C. Shah, and J. Allan. UMass at TDT 2004. In 2004 Topic Detection and Tracking Workshop (TDT'04), Gaithersburg, Maryland, USA, 2004.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
Vijay Kumar , Richard Furuta , Robert B. Allen, Metadata visualization for digital libraries: interactive timeline editing and review, Proceedings of the third ACM conference on Digital libraries, p.126-133, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276689]
|
| |
11
|
|
 |
12
|
|
 |
13
|
Xia Lin , Dagobert Soergel , Gary Marchionini, A self-organizing semantic map for information retrieval, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.262-269, October 13-16, 1991, Chicago, Illinois, United States
[doi> 10.1145/122860.122887]
|
| |
14
|
|
| |
15
|
J. B. Lovins. Development of a stemming algorithm. Mechanical Traqnslation and Computational Linguistics, 11:22--31, 1968.
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
D. C. Montogomery and G. C. Runger. Applied Statistics and Probability for Engineers. John Wiley & Sons, Inc., second edition, 1999.
|
 |
20
|
|
| |
21
|
|
| |
22
|
H. J. Peat and P. Willett. The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science (JASIS), 41(4):378--383, 1991.
|
 |
23
|
|
| |
24
|
|
| |
25
|
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science (JASIS), 41(4):288--297, 1990.
|
 |
26
|
|
 |
27
|
|
| |
28
|
M. Spitters and W. Kraaij. TNO at TDT2001: Language model-based topic detection. In 2001 Topic Detection and Tracking Workshop (TDT'01), Gaithersburg, Maryland, USA, 2001.
|
| |
29
|
M. Steinbach, G. Karypis, and V. Kumar. A Comparison of Document Clustering Techniques. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00) Workshop on Text Mining, 2000.
|
 |
30
|
|
 |
31
|
|
| |
32
|
D. Trieschnigg and W. Kraaij. Hierarchical topic detection in large digital news archives. In Proceedings of the 5th Dutch Belgian Information Retrieval workshop, pages 55--62, Utrecht, the Netherlands, 2005.
|
 |
33
|
Michail Vlachos , Christopher Meek , Zografoula Vagena , Dimitrios Gunopulos, Identifying similarities, periodicities and bursts for online search queries, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007586]
|
| |
34
|
E. Wiener, J. O. Pedersen, and A. S. Weigend. A neural network approach to topic spotting. In Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), pages 317--332, Las Vegas, USA, 1994.
|
| |
35
|
J. A. Wise , J. J. Thomas , K. Pennock , D. Lantrip , M. Pottier , A. Schur , V. Crow, Visualizing the non-visual: spatial analysis and interaction with information from text documents, Proceedings of the 1995 IEEE Symposium on Information Visualization, p.51, October 30-31, 1995, Atlanta, Georgia
|
 |
36
|
Yiming Yang , Tom Ault , Thomas Pierce , Charles W. Lattimer, Improving text categorization methods for event tracking, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.65-72, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345550]
|
| |
37
|
Yiming Yang , Jaime G. Carbonell , Ralf D. Brown , Thomas Pierce , Brian T. Archibald , Xin Liu, Learning Approaches for Detecting and Tracking News Events, IEEE Intelligent Systems, v.14 n.4, p.32-43, July 1999
[doi> 10.1109/5254.784083]
|
 |
38
|
|
CITED BY 4
|
|
|
|
|
Canhui Wang , Min Zhang , Liyun Ru , Shaoping Ma, Automatic online news topic ranking using media focus and user attention based on aging theory, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Shizhu Liu , Yuval Merhav , Wai Gen Yee , Nazli Goharian , Ophir Frieder, A sentence level probabilistic model for evolutionary theme pattern mining from news corpora, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|