ACM Home Page
Please provide us with feedback. Feedback
Predicting query difficulty on the web by learning visual clues
Full text PdfPdf (224 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
POSTER SESSION: Posters table of contents
Pages: 615 - 616  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Eric C. Jensen  Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL
Steven M. Beitzel  Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL
David Grossman  Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL
Ophir Frieder  Illinois Institute of Technology Information Retrieval Laboratory, Chicago, IL
Abdur Chowdhury  America Online, Inc., Dulles, VA
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 51,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076155
What is a DOI?

ABSTRACT

We describe a method for predicting query difficulty in a precision-oriented web search task. Our approach uses visual features from retrieved surrogate document representations (titles, snippets, etc.) to predict retrieval effectiveness for a query. By training a supervised machine learning algorithm with manually evaluated queries, visual clues indicative of relevance are discovered. We show that this approach has a moderate correlation of 0.57 with precision at 10 scores from manual relevance judgments of the top ten documents retrieved by ten web search engines over 896 queries. Our findings indicate that difficulty predictors which have been successful in recall-oriented ad-hoc search, such as clarity metrics, are not nearly as correlated with engine performance in precision-oriented tasks such as this, yielding a maximum correlation of 0.3. Additionally, relying only on visual clues avoids the need for collection statistics that are required by these prior approaches. This enables our approach to be employed in environments where these statistics are unavailable or costly to retrieve, such as metasearch.




Collaborative Colleagues:
Eric C. Jensen: colleagues
Steven M. Beitzel: colleagues
David Grossman: colleagues
Ophir Frieder: colleagues
Abdur Chowdhury: colleagues