|
ABSTRACT
Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy. A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once. In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries. Category selection can improve both efficiency and effectiveness of the online classification. By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 9.7% and 3.8% in terms of precision and F1 respectively compared with the best results of KDDCUP 2005.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Steven M. Beitzel , Eric C. Jensen , Ophir Frieder , David Grossman , David D. Lewis , Abdur Chowdhury , Aleksandr Kolcz, Automatic web query classification using labeled and unlabeled training data, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076138]
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
|
| |
8
|
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to wordnet: an on-line lexical database. International Journal of Lexicography, 3(4):23--244, 1990.
|
| |
9
|
|
| |
10
|
J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.
|
 |
11
|
Dou Shen , Rong Pan , Jian-Tao Sun , Jeffrey Junfeng Pan , Kangheng Wu , Jie Yin , Qiang Yang, Q2C@UST: our winning solution to query classification in KDDCUP 2005, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.100-110, December 2005
[doi> 10.1145/1117454.1117467]
|
| |
12
|
R. C. van. Information Retrieval. Butterworths, London, second edition edition, 1979.
|
 |
13
|
David Vogel , Steffen Bickel , Peter Haider , Rolf Schimpfky , Peter Siemen , Steve Bridges , Tobias Scheffer, Classifying search engine queries using the web as background knowledge, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.117-122, December 2005
[doi> 10.1145/1117454.1117469]
|
 |
14
|
|
| |
15
|
|
CITED BY 23
|
|
|
|
|
Dou Shen , Rong Pan , Jian-Tao Sun , Jeffrey Junfeng Pan , Kangheng Wu , Jie Yin , Qiang Yang, Query enrichment for web-query classification, ACM Transactions on Information Systems (TOIS), v.24 n.3, p.320-352, July 2006
|
|
|
Xuefeng He , Jun Yan , Jinwen Ma , Ning Liu , Zheng Chen, Query topic detection for reformulation, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Andrei Z. Broder , Marcus Fontoura , Evgeniy Gabrilovich , Amruta Joshi , Vanja Josifovski , Tong Zhang, Robust classification of rare queries using web knowledge, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
Xiubo Geng , Tie-Yan Liu , Tao Qin , Andrew Arnold , Hang Li , Heung-Yeung Shum, Query dependent ranking using K-nearest neighbor, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Jian Hu , Gang Wang , Fred Lochovsky , Jian-tao Sun , Zheng Chen, Understanding user's query intent with wikipedia, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
Jiafeng Guo , Gu Xu , Xueqi Cheng , Hang Li, Named entity recognition in query, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Evgeniy Gabrilovich , Andrei Broder , Marcus Fontoura , Amruta Joshi , Vanja Josifovski , Lance Riedel , Tong Zhang, Classifying search queries using the Web as a source of knowledge, ACM Transactions on the Web (TWEB), v.3 n.2, p.1-28, April 2009
|
|
|
|
|
|
Jaime Arguello , Fernando Diaz , Jamie Callan , Jean-Francois Crespo, Sources of evidence for vertical selection, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Huanhuan Cao , Derek Hao Hu , Dou Shen , Daxin Jiang , Jian-Tao Sun , Enhong Chen , Qiang Yang, Context-aware query classification, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
Haijun Zhai , Jiafeng Guo , Qiong Wu , Xueqi Cheng , Huawei Sheng , Jin Zhang, Query Classification Based on Regularized Correlated Topic Model, Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, p.552-555, September 15-18, 2009
|
|