ACM Home Page
Please provide us with feedback. Feedback
PSkip: estimating relevance ranking quality from web search clickthrough data
Full text MovMov (11:28),  PdfPdf (1.85 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Industrial track papers table of contents
Pages 1355-1364  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Kuansan Wang  Microsoft Corporation, Redmond, WA, USA
Toby Walker  Microsoft Corporation, Redmond, WA, USA
Zijian Zheng  Microsoft Corporation, Redmond, WA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 58,   Downloads (12 Months): 195,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557164
What is a DOI?

ABSTRACT

In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to quantify the ranking quality by estimating the probability of users encountering non relevant results that cost them the efforts to read and skip. A search engine with a lower pSkip is regarded as having a better ranking quality. A key design goal of pSkip is to integrate the findings from two sets of user studies that utilize eye-tracking devices to track users' browsing patterns on the search result pages, and that use specially instrumented browsers to actively solicit users' explicit judgments on their search activities. We present the derivation of the maximum likelihood estimation of pSkip and demonstrate its efficacy in describing the user study data. The mathematical properties of pSkip are further analyzed and compared with several objective metrics as well as the cumulated gain method that uses subjective judgments. Experimental data show that pSkip can measure aspects of the search quality that these existing metrics are not designed or fail to address, such as identifying the real search intents expressed in the ambiguous queries. Although effective and superior in many ways, we also report a series of experiments that show pSkip may be influenced by system issues that are not directly related to relevance ranking, suggesting that measurements complementary to pSkip are still needed in order to form a holistic and accurate characterization of the ranking quality.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
6
7
8
9
 
10
W. S. Cooper. Expected search length: a single measure of retrieval effectiveness based on weak ordering action of retrieval system. Journal of American Society of Information Science, 19(1), pages 30--41, 1968.
11
12
13
14
15
16
17
18
19
20
 
21
 
22
 
23
24
25
26
27
28
 
29
C. J. van Rijsbergen. A theoretical basis for the use co-occurrence data in information retrieval. Journal of Documentation, 27, pages 106--119, 1977.
30
31
32

Collaborative Colleagues:
Kuansan Wang: colleagues
Toby Walker: colleagues
Zijian Zheng: colleagues