ACM Home Page
Please provide us with feedback. Feedback
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs
Full text PdfPdf (472 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: web search 2 table of contents
Pages 699-708  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Rosie Jones  Yahoo! Research, Burbank, CA, USA
Kristina Lisa Klinkner  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 173,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458176
What is a DOI?

ABSTRACT

Most analysis of web search relevance and performance takes a single query as the unit of search engine interaction. When studies attempt to group queries together by task or session, a timeout is typically used to identify the boundary. However, users query search engines in order to accomplish tasks at a variety of granularities, issuing multiple queries as they attempt to accomplish tasks. In this work we study real sessions manually labeled into hierarchical tasks, and show that timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%. We report on properties of this search task hierarchy, as seen in a random sample of user interactions from a major web search engine's log, annotated by human editors, learning that 17% of tasks are interleaved, and 20% are hierarchically organized. No previous work has analyzed or addressed automatic identification of interleaved and hierarchically organized search tasks. We propose and evaluate a method for the automated segmentation of users' query streams into hierarchical units. Our classifiers can improve on timeout segmentation, as well as other previously published approaches, bringing the accuracy up to 92% for identifying fine-grained task boundaries, and 89-97% for identifying pairs of queries from the same task when tasks are interleaved hierarchically. This is the first work to identify, measure and automatically segment sequences of user queries into their hierarchical structure. The ability to perform this kind of segmentation paves the way for evaluating search engines in terms of user task completion.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Comscore announces new "visits" metric for measuring user engagement, 2007. http://www.comscore.com/press/release.asp?press=1246.
2
 
3
 
4
 
5
D. Downey, S. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and applications. Journal of the American Society for Information Science and Technology (JASIST), 58(6):862--871, 2007.
 
6
7
 
8
B. J. Jansen, A. Spink, C. Blakely, and S. Koshman. Defining a session on web search engines. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2000.
 
9
 
10
 
11
 
12
 
13
H. C. Ozmutlu, F. Cavdur, A. Spink, and S. Ozmutlu. Investigating the performance of automatic new topic identification across multiple datasets. In Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST) 43, Austin (US), 2006.
 
14
15
 
16
B. W. Silverman. Density Estimation. Chapman and Hall, London.
17
 
18
A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of query reformulation and relevance feedback by Excite users. Internet Research: Electronic Networking Applications and Policy, 10(4):317--328, 2000.
 
19
20

CITED BY  8

Collaborative Colleagues:
Rosie Jones: colleagues
Kristina Lisa Klinkner: colleagues