ACM Home Page
Please provide us with feedback. Feedback
Cross-language query classification using web search for exogenous knowledge
Full text PdfPdf (583 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Classification and clustering table of contents
Pages 74-83  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Xuerui Wang  University of Massachusetts, Amherst, MA
Andrei Broder  Yahoo! Research, Santa Clara, CA
Evgeniy Gabrilovich  Yahoo! Research, Santa Clara, CA
Vanja Josifovski  Yahoo! Research, Santa Clara, CA
Bo Pang  Yahoo! Research, Santa Clara, CA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 176,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498811
What is a DOI?

ABSTRACT

The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
N. Bel, C. H. A. Koster, and M. Villegas. Cross-lingual text categorization. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, pages 126--139, 2003.
 
3
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120--128, 2006.
4
5
 
6
 
7
H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101--126, 2006.
8
 
9
 
10
 
11
 
12
 
13
14
 
15
16
 
17
 
18
 
19
20

Collaborative Colleagues:
Xuerui Wang: colleagues
Andrei Broder: colleagues
Evgeniy Gabrilovich: colleagues
Vanja Josifovski: colleagues
Bo Pang: colleagues