| Multi-task text segmentation and alignment based on weighted mutual information |
| Full text |
Pdf
(126 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 15th ACM international conference on Information and knowledge management
table of contents
Arlington, Virginia, USA
POSTER SESSION: Posters
table of contents
Pages: 846 - 847
Year of Publication: 2006
ISBN:1-59593-433-2
|
|
Authors
|
|
Bingjun Sun
|
The Pennsylvania State University, University Park, PA
|
|
Ding Zhou
|
The Pennsylvania State University, University Park, PA
|
|
Hongyuan Zha
|
The Pennsylvania State University, University Park, PA
|
|
John Yen
|
The Pennsylvania State University, University Park, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 45, Citation Count: 1
|
|
|
ABSTRACT
Text segmentation is important for text analysis, while text alignment is to determine shared sub-topics among similar documents. Multi-task text segmentation and alignment is the extension of single-task segmentation to utilize information of multi-source documents. In this paper we introduce a novel domain-independent unsupervised method for multi-task segmentation and alignment based on the idea that the optimal segmentation and alignment maximizes weighted mutual information, mutual information with term weights. The experiment results show that our approach works well.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Systems, 1990.
|
 |
5
|
|
| |
6
|
T. Hofmann. Probabilistic latent semantic analysis. In Proc. UAI, 1999.
|
 |
7
|
|
| |
8
|
M. Utiyama and H. Isahara. A statistical model for domain-independent text segmentation. In Proc. ACL, pages 491--498, 1999.
|
CITED BY
|
|
Bingjun Sun , Prasenjit Mitra , C. Lee Giles , John Yen , Hongyuan Zha, Topic segmentation with shared topic detection and alignment of multiple documents, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|