ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Cross-language linking of news stories on the web using interlingual topic modelling
Full text PdfPdf (353 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 2nd ACM workshop on Social web search and mining table of contents
Hong Kong, China
SESSION: Social mining table of contents
Pages: 57-64  
Year of Publication: 2009
ISBN:978-1-60558-806-3
Authors
Wim De Smet  Katholieke Universiteit Leuven, Leuven, Belgium
Marie-Francine Moens  Katholieke Universiteit Leuven, Leuven, Belgium
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 13,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1651437.1651447
What is a DOI?

ABSTRACT

We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering task, where the evaluation is performed on Google News.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
A. Bagga and B. Baldwin. Algorithms for scoring coreference chains. In In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563--566, 1998.
 
4
 
5
 
6
J. G. Carbonell, J. G. Yang, R. E. Frederking, R. D. Brown, Y. Geng, D. Lee, Y. Frederking, R. E, R. D. Geng, and Y. Yang. Translingual information retrieval: A comparative evaluation. In In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 708--714, 1997.
7
8
 
9
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391--407, 1990.
 
10
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, Stockholm, 1999.
 
11
12
13
14
 
15
M. Littman, S. T. Dumais, and T. K. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval, chapter 5, pages 51--62. Kluwer Academic Publishers, 1998.
 
16
U. Makkonen, H. Ahonen-Myka, and Marko. Applying semantic classes in event detection and tracking. In Proc. International Conference on Natural Language Processing (ICON'02), pages 175--183, 2002.
 
17
B. Mathieu, R. Besançon, and C. Fluhr. Multilingual document clusters discovery. In RIAO, pages 116--125, 2004.
 
18
T. Muramatsu and T. Mori. Integration of plsa into probabilistic clir model. In Proceedings of NTCIR-04, 2004.
19
20
 
21
W. D. Smet and M.-F. Moens. An aspect based document representation for event clustering. In Proceedings of CLIN 19.
 
22
23
 
24
25
 
26

Collaborative Colleagues:
Wim De Smet: colleagues
Marie-Francine Moens: colleagues