ACM Home Page
Please provide us with feedback. Feedback
Query enrichment for web-query classification
Full text PdfPdf (892 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 24 ,  Issue 3  (July 2006) table of contents
Pages: 320 - 352  
Year of Publication: 2006
ISSN:1046-8188
Authors
Dou Shen  Hong Kong University of Science and Technology, Hong Kong, China
Rong Pan  Hong Kong University of Science and Technology, Hong Kong, China
Jian-Tao Sun  Microsoft Research Asia, Beijing, China
Jeffrey Junfeng Pan  Hong Kong University of Science and Technology, Hong Kong, China
Kangheng Wu  Hong Kong University of Science and Technology, Hong Kong, China
Jie Yin  Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang  Hong Kong University of Science and Technology, Hong Kong, China
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 217,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1165774.1165776
What is a DOI?

ABSTRACT

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (precision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
Cann, A. J. 2003. Maths from Scratch for Biologists. John Wiley & Sons, New York, NY.
6
 
7
 
8
Chekuri, C., Goldwasser, M., Raghavan, P., and Upfal, E. 1997. Web search using automated classification. 6th International World Wide Web Conference (WWW6). Poster presentation.
9
 
10
11
 
12
 
13
HITEC. http://categorizer.tmit.bmr.hu.
 
14
Hoel, P. G. 1966. Elementary Statistics, 2nd ed. Wiley, New York, NY.
 
15
Howe, A. E. and Dreilinger, D. 1997. SAVVYSEARCH: A metasearch engine that learns which search engines to query. AI Mag. 18, 2, 19--25.
 
16
Jansen, B. J. 2000. The effect of query complexity on web searching results. Inf. Res. 6, 1.
 
17
 
18
 
19
Jones, K. S. 1971. Automatic Keyword Classifications for Information Retrieval. Butterworth, London, UK.
20
21
 
22
 
23
24
 
25
McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization.
 
26
Meyer, D. A. and Brown, T. A. 1998. Statistical mechanics of voting. Phys. Revei. Lett. 81, 8, 1718--1721.
 
27
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. 1990. Introduction to wordnet: An on-line lexical database. Int. J. Lexicography 3, 4, 23--244.
 
28
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.
 
29
Selberg, E. and Etzioni, O. 1995. Multi-service search and comparison using the MetaCrawler. In Proceedings of the 4th International World-Wide web Conference. Darmstadt, Germany.
30
31
32
 
33
Tikk, D., Biró, Gy., and Yang, J. D. 2005. Experiments with a hierarchial text categorization method on WIPO patent collections. In Applied Research in Uncertainty Modelling and Analysis, N. O. Attok-Okine and B. M. Ayyub, Eds. International Series in Intelligent Technologies, vol. 20, Springer-Verlag, 283--302.
 
34
Van, R. C. 1979. Information Retrieval, 2nd ed. Butterworth, London, UK.
35
 
36
37
 
38
 
39

CITED BY  11


REVIEW

"George Pallis : Reviewer"

Web query classification is a rather interesting area of research, since Web users' queries are typically short, noisy, and ambiguous. This paper presents a new approach to classifying Web search queries, made up of two phases. First, the classifi  more...

Collaborative Colleagues:
Dou Shen: colleagues
Rong Pan: colleagues
Jian-Tao Sun: colleagues
Jeffrey Junfeng Pan: colleagues
Kangheng Wu: colleagues
Jie Yin: colleagues
Qiang Yang: colleagues