|
ABSTRACT
Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
Steven M. Beitzel , Eric C. Jensen , Ophir Frieder , David Grossman , David D. Lewis , Abdur Chowdhury , Aleksandr Kolcz, Automatic web query classification using labeled and unlabeled training data, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076138]
|
| |
5
|
Cann, A. J. 2003. Maths from Scratch for Biologists. John Wiley & Sons, New York, NY.
|
 |
6
|
Rich Caruana , Alexandru Niculescu-Mizil , Geoff Crew , Alex Ksikes, Ensemble selection from libraries of models, Proceedings of the twenty-first international conference on Machine learning, p.18, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015432]
|
| |
7
|
|
| |
8
|
Chekuri, C., Goldwasser, M., Raghavan, P., and Upfal, E. 1997. Web search using automated classification. 6th International World Wide Web Conference (WWW6). Poster presentation.
|
 |
9
|
|
| |
10
|
|
 |
11
|
Wei Fan , Salvatore J. Stolfo , Junxin Zhang, The application of AdaBoost for distributed, scalable and on-line learning, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.362-366, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312283]
|
| |
12
|
|
| |
13
|
HITEC. http://categorizer.tmit.bmr.hu.
|
| |
14
|
Hoel, P. G. 1966. Elementary Statistics, 2nd ed. Wiley, New York, NY.
|
| |
15
|
Howe, A. E. and Dreilinger, D. 1997. SAVVYSEARCH: A metasearch engine that learns which search engines to query. AI Mag. 18, 2, 19--25.
|
| |
16
|
Jansen, B. J. 2000. The effect of query complexity on web searching results. Inf. Res. 6, 1.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Jones, K. S. 1971. Automatic Keyword Classifications for Information Retrieval. Butterworth, London, UK.
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization.
|
| |
26
|
Meyer, D. A. and Brown, T. A. 1998. Statistical mechanics of voting. Phys. Revei. Lett. 81, 8, 1718--1721.
|
| |
27
|
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 23--244.
|
| |
28
|
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.
|
| |
29
|
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World-Wide web Conference. Darmstadt, Germany.
|
 |
30
|
Dou Shen , Rong Pan , Jian-Tao Sun , Jeffrey Junfeng Pan , Kangheng Wu , Jie Yin , Qiang Yang, Q2C@UST: our winning solution to query classification in KDDCUP 2005, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.100-110, December 2005
[doi> 10.1145/1117454.1117467]
|
 |
31
|
Dou Shen , Jian-Tao Sun , Qiang Yang , Zheng Chen, Building bridges for web query classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148196]
|
 |
32
|
|
| |
33
|
Tikk, D., Biró, Gy., and Yang, J. D. 2005. Experiments with a hierarchial text categorization method on WIPO patent collections. In Applied Research in Uncertainty Modelling and Analysis, N. O. Attok-Okine and B. M. Ayyub, Eds. International Series in Intelligent Technologies, vol. 20, Springer-Verlag, 283--302.
|
| |
34
|
Van, R. C. 1979. Information Retrieval, 2nd ed. Butterworth, London, UK.
|
 |
35
|
David Vogel , Steffen Bickel , Peter Haider , Rolf Schimpfky , Peter Siemen , Steve Bridges , Tobias Scheffer, Classifying search engine queries using the web as background knowledge, ACM SIGKDD Explorations Newsletter, v.7 n.2, p.117-122, December 2005
[doi> 10.1145/1117454.1117469]
|
| |
36
|
|
 |
37
|
|
| |
38
|
|
| |
39
|
|
CITED BY 11
|
|
Andrei Z. Broder , Marcus Fontoura , Evgeniy Gabrilovich , Amruta Joshi , Vanja Josifovski , Tong Zhang, Robust classification of rare queries using web knowledge, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
Dou Shen , Toby Walkery , Zijian Zhengy , Qiang Yangz , Ying Li, Personal name classification in web queries, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Dou Shen , Min Qin , Weizhu Chen , Qiang Yang , Zheng Chen, Mining web query hierarchies from clickthrough data, Proceedings of the 22nd national conference on Artificial intelligence, p.341-346, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Evgeniy Gabrilovich , Andrei Broder , Marcus Fontoura , Amruta Joshi , Vanja Josifovski , Lance Riedel , Tong Zhang, Classifying search queries using the Web as a source of knowledge, ACM Transactions on the Web (TWEB), v.3 n.2, p.1-28, April 2009
|
|
|
|
|
|
Ning Liu , Jun Yan , Weiguo Fan , Qiang Yang , Zheng Chen, Identifying vertical search intention of query through social tagging propagation, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
REVIEW
"George Pallis : Reviewer"
Web query classification is a rather interesting area of research, since Web users' queries are typically short, noisy, and ambiguous. This paper presents a new approach to classifying Web search queries, made up of two phases. First, the classifi
more...
|