ACM Home Page
Please provide us with feedback. Feedback
Towards context-aware search by learning a very large variable length hidden markov model from search logs
Full text PdfPdf (1.48 MB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
SESSION: Data mining/session: learning table of contents
Pages 191-200  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Authors
Huanhuan Cao  University of Science and Technology of China, Hefei, China
Daxin Jiang  Microsoft Research Asia, Beijing, China
Jian Pei  Simon Fraser University, Vancouver, Canada
Enhong Chen  University of Science and Technology of China, Hefei, China
Hang Li  Microsoft Research Asia, Beijing, China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 44,   Downloads (12 Months): 170,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526736
What is a DOI?

ABSTRACT

Capturing the context of a user's query from the previous queries and clicks in the same session may help understand the user's information need. A context-aware approach to document re-ranking, query suggestion, and URL recommendation may improve users' search experience substantially. In this paper, we propose a general approach to context-aware search. To capture contexts of queries, we learn a variable length Hidden Markov Model (vlHMM) from search sessions extracted from log data. Although the mathematical model is intuitive, how to learn a large vlHMM with millions of states from hundreds of millions of search sessions poses a grand challenge. We develop a strategy for parameter initialization in vlHMM learning which can greatly reduce the number of parameters to be estimated in practice. We also devise a method for distributed vlHMM learning under the map-reduce model. We test our approach on a real data set consisting of 1.8 billion queries, 2.6 billion clicks, and 840 million search sessions, and evaluate the effectiveness of the vlHMM learned from the real data on three search applications: document re-ranking, query suggestion, and URL recommendation. The experimental results show that our approach is both effective and efficient.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Baeza-Yates, R.A., et al. Query recommendation using query logs in search engines. In EDBT 2004 Workshop on Clustering Information over the Web, pages 588--596, 2004.
2
3
 
4
Baum, L.E., et al. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Statist., 41(1):164--171, 1970.
5
6
7
 
8
Chu, C.T., et al. Map-reduce for machine learning on multicore. In NIPS, pages 281--288, 2006.
 
9
 
10
Dempster, A.P., et al. Maximal Likelihood from Incomplete Data Via the EM Algorithm. Journal of the Royal Statistical Society, Ser B(39):1--38, 1977.
 
11
Durbin, R., et al. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.
12
 
13
14
15
16
 
17
 
18
Rocchio, J. Relevance feedback information retrieval. Prentice-Hall Inc., 1971.
19
20
 
21
22
23
24
 
25
Zhao, M., et al. Adapting document ranking to users preferences using click-through Data. In AIRS'06, pages 26--42, 2006.


Collaborative Colleagues:
Huanhuan Cao: colleagues
Daxin Jiang: colleagues
Jian Pei: colleagues
Enhong Chen: colleagues
Hang Li: colleagues