ACM Home Page
Please provide us with feedback. Feedback
Information arbitrage across multi-lingual Wikipedia
Full text PdfPdf (570 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Classification and clustering table of contents
Pages 94-103  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Eytan Adar  University of Washington, Seattle, WA
Michael Skinner  Google, Seattle, WA
Daniel S. Weld  University of Washington, Seattle, WA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 168,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498813
What is a DOI?

ABSTRACT

The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage--leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), we present Ziggurat, an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Our method uses self-supervised learning and our experiments demonstrate the method's feasibility, even in the absence of dictionaries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Adafre, S. F., and M. de Rijke, "Finding Similar Sentences across Multiple Languages in Wikipedia," EACL '06, Trento, Italy, April 2006.
 
3
DBpedia, www.dbpedia.org, last retrieved Aug. 10, 2008.
 
4
Etzioni, O., K. Reiter, S. Soderland, and M. Sammer, "Lexical translation with application to image searching on the web." MT Summit XI, Copenhagen, Denmark, September, 2007.
 
5
Ferrández, S., A. Toral, Ó. Ferrández, A. Ferrández, and R. Muñoz, "Applying Wikipedia's Multilingual Knowledge to Cross--Lingual Question Answering," Lecture Notes in Computer Science, vol. 4592, Springer, 2007.
 
6
Friedman, J., T. Hastie, and R. Tibshirani, "Additive logistic regression: A statistical view of boosting." Annals of Statistics, 28(20), 337--407, 2000
 
7
Kawaba, M., H. Nakasaki, T. Utsuro, and T. Fukuhara, "Cross-Lingual Blog Analysis based on Multilingual Blog Distillation from Multilingual Wikipedia Entries," ICWSM'08, Seattle, WA, March 2008.
 
8
Kinzler, D., "Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia" Thesis, 2008.
 
9
 
10
 
11
Richman, A. E., and P Schone, "Mining Wiki Resources for Multilingual Named Entity Recognition," ACL'08, Columbus, Ohio, June 2008.
 
12
Sorg, P., and P. Cimiano, "Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach," AAAI'08 Wikipedia and Artificial Intelligence Workshop, Chicago, IL, July 2008.
 
13
 
14
Voss, J, "Measuring Wikipedia." 10th International Conference of the International Society for Scientometrics and Informetrics, Stockholm, Sweden. 2005.
 
15
Weld, Daniel S., F. Wu, E. Adar, S. Amershi, J. Fogarty, R. Hoffman, K. Patel, and M. Skinner, "Intelligence in Wikipedia," AAAI'08, Chicago, IL, July 2008.
 
16
Wentald, W., J. Knopp, C. Silberer, and M. Hartung, "Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration," LREC '08, Marrakech, Morocco, May 2008.
 
17
"Wikipedia: MultiLingual Statistics," Aug. 10, 2008 en.wikipedia.org/wiki/Wikipedia:Multilingual_statistics,
18
19
20


Collaborative Colleagues:
Eytan Adar: colleagues
Michael Skinner: colleagues
Daniel S. Weld: colleagues