|
ABSTRACT
Understanding the intent behind a user's query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user's intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results shows that our method significantly outperforms other methods in each intent domain.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Steven M. Beitzel , Eric C. Jensen , Ophir Frieder , David D. Lewis , Abdur Chowdhury , Aleksander Kolcz, Improving Automatic Query Classification via Semi-Supervised Learning, Proceedings of the Fifth IEEE International Conference on Data Mining, p.42-49, November 27-30, 2005
[doi> 10.1109/ICDM.2005.80]
|
 |
2
|
|
 |
3
|
Andrei Z. Broder , Marcus Fontoura , Evgeniy Gabrilovich , Amruta Joshi , Vanja Josifovski , Tong Zhang, Robust classification of rare queries using web knowledge, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277783]
|
| |
4
|
Toral, A. and Munoz, R., A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006.
|
 |
5
|
Dou Shen , Jian-Tao Sun , Qiang Yang , Zheng Chen, Building bridges for web query classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148196]
|
 |
6
|
Dou Shen , Rong Pan , Jian-Tao Sun , Jeffrey Junfeng Pan , Kangheng Wu , Jie Yin , Qiang Yang, Q2C@UST: our winning solution to query classification in KDDCUP 2005, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.100-110, December 2005
[doi> 10.1145/1117454.1117467]
|
| |
7
|
|
| |
8
|
|
 |
9
|
David Vogel , Steffen Bickel , Peter Haider , Rolf Schimpfky , Peter Siemen , Steve Bridges , Tobias Scheffer, Classifying search engine queries using the web as background knowledge, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.117-122, December 2005
[doi> 10.1145/1117454.1117469]
|
| |
10
|
S Strube, M. and Ponzetto, S.P., Deriving a large scale taxonomy from Wikipedia. In Proc. of the Twenty-Second National Conference on Artificial Intelligence (AAAI-2007), 2007.
|
| |
11
|
Bunescu, R. and Pasca, M., Using encyclopedic knowledge for named entity disambiguation. In Proc. of the 11th Conference of the European Chapter of the Association for Computational Linguistics(EACL-06),2006
|
| |
12
|
Cucerzan, S., Large-scale named entity disambiguation based on Wikipedia data. in Proc. of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP-07).
|
 |
13
|
Honghua (Kathy) Dai , Lingzhi Zhao , Zaiqing Nie , Ji-Rong Wen , Lee Wang , Ying Li, Detecting online commercial intention (OCI), Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135902]
|
 |
14
|
|
 |
15
|
Dou Shen , Toby Walkery , Zijian Zhengy , Qiang Yangz , Ying Li, Personal name classification in web queries, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341553]
|
| |
16
|
|
 |
17
|
Ian H. Witten , Gordon W. Paynter , Eibe Frank , Carl Gutwin , Craig G. Nevill-Manning, KEA: practical automatic keyphrase extraction, Proceedings of the fourth ACM conference on Digital libraries, p.254-255, August 11-14, 1999, Berkeley, California, United States
[doi> 10.1145/313238.313437]
|
| |
18
|
|
 |
19
|
Hugo Zaragoza , Henning Rode , Peter Mika , Jordi Atserias , Massimiliano Ciaramita , Giuseppe Attardi, Ranking very many typed entities on wikipedia, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
[doi> 10.1145/1321440.1321599]
|
 |
20
|
|
 |
21
|
|
| |
22
|
Gabrilovich, E. and Markovitch, S., Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006.
|
| |
23
|
Gabrilovich, E. and Markovitch, S., Computing semantic relatedness using Wikipedia based explicit semantic analysis. In Proc. of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), 2007.
|
| |
24
|
Ruiz-Casado, M., Alfonseca, E., and Castells, P., Automatic extraction of semantic relationships for WordNet by means of pattern learning from Wikipedia. In Proc of the 11th International Conference on Applications of Natural Language to Information Systems (NLDB2006), 2006.
|
| |
25
|
Strube, M. and Ponzetto, S.P., WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006), 2006.
|
| |
26
|
|
 |
27
|
Jian Hu , Lujun Fang , Yang Cao , Hua-Jun Zeng , Hua Li , Qiang Yang , Zheng Chen, Enhancing text clustering by leveraging Wikipedia semantics, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390367]
|
| |
28
|
|
|