|
ABSTRACT
We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering task, where the evaluation is performed on Google News.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
A. Bagga and B. Baldwin. Algorithms for scoring coreference chains. In In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563--566, 1998.
|
| |
4
|
|
| |
5
|
|
| |
6
|
J. G. Carbonell, J. G. Yang, R. E. Frederking, R. D. Brown, Y. Geng, D. Lee, Y. Frederking, R. E, R. D. Geng, and Y. Yang. Translingual information retrieval: A comparative evaluation. In In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 708--714, 1997.
|
 |
7
|
Peter A. Chew , Brett W. Bader , Tamara G. Kolda , Ahmed Abdelali, Cross-language information retrieval using PARAFAC2, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281211]
|
 |
8
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
9
|
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391--407, 1990.
|
| |
10
|
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, Stockholm, 1999.
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
M. Littman, S. T. Dumais, and T. K. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval, chapter 5, pages 51--62. Kluwer Academic Publishers, 1998.
|
| |
16
|
U. Makkonen, H. Ahonen-Myka, and Marko. Applying semantic classes in event detection and tracking. In Proc. International Conference on Natural Language Processing (ICON'02), pages 175--183, 2002.
|
| |
17
|
B. Mathieu, R. Besançon, and C. Fluhr. Multilingual document clusters discovery. In RIAO, pages 116--125, 2004.
|
| |
18
|
T. Muramatsu and T. Mori. Integration of plsa into probabilistic clir model. In Proceedings of NTCIR-04, 2004.
|
 |
19
|
|
 |
20
|
|
| |
21
|
W. D. Smet and M.-F. Moens. An aspect based document representation for event clustering. In Proceedings of CLIN 19.
|
| |
22
|
|
 |
23
|
|
| |
24
|
Yiming Yang , Jaime G. Carbonell , Ralf D. Brown , Thomas Pierce , Brian T. Archibald , Xin Liu, Learning Approaches for Detecting and Tracking News Events, IEEE Intelligent Systems, v.14 n.4, p.32-43, July 1999
[doi> 10.1109/5254.784083]
|
 |
25
|
|
| |
26
|
|
|