ACM Home Page
Please provide us with feedback. Feedback
Thread detection in dynamic text message streams
Full text PdfPdf (180 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Handling messages and finding experts table of contents
Pages: 35 - 42  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Dou Shen  Hong Kong University of Science and Technology
Qiang Yang  Hong Kong University of Science and Technology
Jian-Tao Sun  Microsoft Research Asia, Beijing, P.R.China
Zheng Chen  Microsoft Research Asia, Beijing, P.R.China
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 167,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148180
What is a DOI?

ABSTRACT

Text message stream is a newly emerging type of Web data which is produced in enormous quantities with the popularity of Instant Messaging and Internet Relay Chat. It is beneficial for detecting the threads contained in the text stream for various applications, including information retrieval, expert recognition and even crime prevention. Despite its importance, not much research has been conducted so far on this problem due to the characteristics of the data in which the messages are usually very short and incomplete. In this paper, we present a stringent definition of the thread detection task and our preliminary solution to it. We propose three variations of a single-pass clustering algorithm for exploiting the temporal information in the streams. An algorithm based on linguistic features is also put forward to exploit the discourse structure information. We conducted several experiments to compare our approaches with some existing algorithms on a real dataset. The results show that all three variations of the single-pass algorithm outperform the basic single-pass algorithm. Our proposed algorithm based on linguistic features improves the performance relatively by 69.5% and 9.7% when compared with the basic single-pass algorithm and the best variation algorithm in terms of F1 respectively.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 194--218, 1998.
 
2
J. Bengel, S. Gauch, E. Mittur, and R. Vijayaraghavan. Chattrack: Chat room topic detection using classification. In 2nd Symposium on Intelligence and Security Informatics (ISI-2004)., page 266-277, Tucson, Arizona., June 2004.
 
3
 
4
E. Elnahrawy. Log-based chat room monitoring using text categorization: A comparative study. In St.Thomas, editor, Proceedings of the IASTED International Conference on Information and Knowledge Sharing (IKS 2002), US Virgin Islands, USA, November 2002.
5
 
6
7
 
8
 
9
F. M. Khan, T. A. Fisher, L. Shuler, T. Wu, and W. M. Pottenger. Mining chatroom conversations for social and semantic interactions. Technical Report LU-CSE-02-011, Lehigh University, 2002.
10
11
 
12
13
 
14
 
15
K. G. Steinbach, M. and V. Kumar. A comparison of document clustering techniques. Technical report 00-034, Department of Computer Science and Engineering, University of Minnesota, 2000.
 
16
 
17
R. C. van. Information Retrieval. Butterworths, London, second edition edition, 1979.
 
18
A. Waibel, M. Bett, M. Finke, and R. Stiefelhagen. Meeting brower: Tracking and summarizing meetings. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.
19
20


Collaborative Colleagues:
Dou Shen: colleagues
Qiang Yang: colleagues
Jian-Tao Sun: colleagues
Zheng Chen: colleagues