ACM Home Page
Please provide us with feedback. Feedback
Crosslingual location search
Full text PdfPdf (637 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Singapore, Singapore
SESSION: Multilingual & crosslingual retrieval table of contents
Pages 211-218  
Year of Publication: 2008
ISBN:978-1-60558-164-4
Authors
Tanuja Joshi  Microsoft Research India, Bangalore, India
Joseph Joy  Microsoft Research India, Bangalore, India
Tobias Kellner  Microsoft Research India, Bangalore, India
Udayan Khurana  Microsoft India R&D, Hyderabad, India
A Kumaran  Microsoft Research India, Bangalore, India
Vibhuti Sengar  Microsoft Research India, Bangalore, India
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 211,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390334.1390372
What is a DOI?

ABSTRACT

Address geocoding, the process of finding the map location for a structured postal address, is a relatively well-studied problem. In this paper we consider the more general problem of crosslingual location search, where the queries are not limited to postal addresses, and the language and script used in the search query is different from the one in which the underlying data is stored. To the best of our knowledge, our system is the first crosslingual location search system that is able to geocode complex addresses. We use a statistical machine transliteration system to convert location names from the script of the query to that of the stored data. However, we show that it is not sufficient to simply feed the resulting transliterations into a monolingual geocoding system, as the ambiguity inherent in the conversion drastically expands the location search space and significantly lowers the quality of results. The strength of our approach lies in its integrated, end-to-end nature: we use abstraction and fuzzy search (in the text domain) to achieve maximum coverage despite transliteration ambiguities, while applying spatial constraints (in the geographic domain) to focus only on viable interpretations of the query. Our experiments with structured and unstructured queries in a set of diverse languages and scripts (Arabic, English, Hindi and Japanese) searching for locations in different regions of the world, show full crosslingual location search accuracy at levels comparable to that of commercial monolingual systems. We achieve these levels of performance using techniques that may be applied to crosslingual searches in any language/script, and over arbitrary spatial data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Christen, P., Churches, T. and Willmore, A. A probabilistic geocoding system based on a national address file. In Proc. 3rd Australasian Data Mining Conf., 2004.
 
4
CLEF Forum. http://www.clef-campaign.org/.
5
 
6
GeoCLEF. http://ir.shef.ac.uk/geoclef/.
 
7
Goldberg, D. W., Wilson, J. P., and Knoblock, C. A. From text to geographic coordindates: The current state of geocoding. In J. Urban and Regional Information Systems Assoc., 2006.
 
8
Joshi, T., Joy, J., and Sengar, V. Robust Location Search. Technical Report MSR-TR-2008-41, Microsoft Research, 2008.
 
9
Goto, I., Kato, N., Uratani, N. and Ehara, T. Transliteration considering context information based on the Maximum entropy method. In Proc. MT Summit IX, 2004.
 
10
11
 
12
Lin, Dekang. MaxEnt Classifier. 2003. http://www.cs.ualberta.ca/~lindek/maxent.tgz.
 
13
Oh,J., Choi, K. & Isahara, H. A comparison of different machine transliteration models. Artificial Intelligence Research, 2006.
14
 
15
 
16
Russell, R. Soundex. US Patent 1,261,167, 1918.
17
18

Collaborative Colleagues:
Tanuja Joshi: colleagues
Joseph Joy: colleagues
Tobias Kellner: colleagues
Udayan Khurana: colleagues
A Kumaran: colleagues
Vibhuti Sengar: colleagues