ACM Home Page
Please provide us with feedback. Feedback
Automatic web query classification using labeled and unlabeled training data
Full text PdfPdf (157 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
POSTER SESSION: Posters table of contents
Pages: 581 - 582  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Steven M. Beitzel  Illinois Institute of Technology
Eric C. Jensen  Illinois Institute of Technology
Ophir Frieder  Illinois Institute of Technology
David Grossman  Illinois Institute of Technology
David D. Lewis  America Online, Inc.
Abdur Chowdhury  America Online, Inc.
Aleksandr Kolcz  America Online, Inc.
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 118,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076138
What is a DOI?

ABSTRACT

Accurate topical categorization of user queries allows for increased effectiveness, efficiency, and revenue potential in general-purpose web search systems. Such categorization becomes critical if the system is to return results not just from a general web collection but from topic-specific databases as well. Maintaining sufficient categorization recall is very difficult as web queries are typically short, yielding few features per query. We examine three approaches to topical categorization of general web queries: matching against a list of manually labeled queries, supervised learning of classifiers, and mining of selectional preference rules from large unlabeled query logs. Each approach has its advantages in tackling the web query classification recall problem, and combining the three techniques allows us to classify a substantially larger proportion of queries than any of the individual techniques. We examine the performance of each approach on a real web query stream and show that our combined method accurately classifies 46% of queries, outperforming the recall of the best single approach by nearly 20%, with a 7% improvement in overall effectiveness.



CITED BY  17

Collaborative Colleagues:
Steven M. Beitzel: colleagues
Eric C. Jensen: colleagues
Ophir Frieder: colleagues
David Grossman: colleagues
David D. Lewis: colleagues
Abdur Chowdhury: colleagues
Aleksandr Kolcz: colleagues