ACM Home Page
Please provide us with feedback. Feedback
Understanding user's query intent with wikipedia
Full text PdfPdf (1.57 MB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
SESSION: Search/session: query categorization table of contents
Pages 471-480  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Authors
Jian Hu  Microsoft Research Asia, Beijing, China
Gang Wang  Microsoft Research Asia, Beijing, China
Fred Lochovsky  The Hong Kong University of Science and Technology, Hong Kong, China
Jian-tao Sun  Microsoft Research Asia, Beijing, China
Zheng Chen  Microsoft Research Asia, Beijing, China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 84,   Downloads (12 Months): 368,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526773
What is a DOI?

ABSTRACT

Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
Toral, A. and Munoz, R., A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006.
5
6
 
7
 
8
9
 
10
S Strube, M. and Ponzetto, S.P., Deriving a large scale taxonomy from Wikipedia. In Proc. of the Twenty-Second National Conference on Artificial Intelligence (AAAI-2007), 2007.
 
11
Bunescu, R. and Pasca, M., Using encyclopedic knowledge for named entity disambiguation. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006
 
12
Cucerzan, S., Large-scale named entity disambiguation based on Wikipedia data. in Proc. of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP-07).
13
14
15
 
16
17
 
18
19
20
21
 
22
Gabrilovich, E. and Markovitch, S., Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006.
 
23
Gabrilovich, E. and Markovitch, S., Computing semantic relatedness using Wikipedia based explicit semantic analysis. In Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), 2007.
 
24
Ruiz-Casado, M., Alfonseca, E., and Castells, P., Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In Proc of the 11th International Conference on Applications of Natural Language to Information Systems (NLDB2006), 2006.
 
25
Strube, M. and Ponzetto, S.P., WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006.
 
26
27
 
28

Collaborative Colleagues:
Jian Hu: colleagues
Gang Wang: colleagues
Fred Lochovsky: colleagues
Jian-tao Sun: colleagues
Zheng Chen: colleagues