ACM Home Page
Please provide us with feedback. Feedback
Building bridges for web query classification
Full text PdfPdf (188 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Fusion and spam table of contents
Pages: 131 - 138  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Dou Shen  Hong Kong University of Science and Technology
Jian-Tao Sun  Microsoft Research Asia, Beijing, P.R.China
Qiang Yang  Hong Kong University of Science and Technology
Zheng Chen  Microsoft Research Asia, Beijing, P.R.China
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 42,   Downloads (12 Months): 299,   Citation Count: 22
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148196
What is a DOI?

ABSTRACT

Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy. A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once. In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries. Category selection can improve both efficiency and effectiveness of the online classification. By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 9.7% and 3.8% in terms of precision and F1 respectively compared with the best results of KDDCUP 2005.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
6
 
7
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classication. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
 
8
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to wordnet: an on-line lexical database. International Journal of Lexicography, 3(4):23--244, 1990.
 
9
 
10
J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.
11
 
12
R. C. van. Information Retrieval. Butterworths, London, second edition edition, 1979.
13
14
 
15

CITED BY  23

Collaborative Colleagues:
Dou Shen: colleagues
Jian-Tao Sun: colleagues
Qiang Yang: colleagues
Zheng Chen: colleagues