ACM Home Page
Please provide us with feedback. Feedback
Learning to cluster web search results
Full text PdfPdf (210 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Sheffield, United Kingdom
SESSION: Clustering table of contents
Pages: 210 - 217  
Year of Publication: 2004
ISBN:1-58113-881-4
Authors
Hua-Jun Zeng  Microsoft Research, Asia, Beijing, P.R. China
Qi-Cai He  Peking University, Beijing, P. R. China
Zheng Chen  Microsoft Research, Asia, Beijing, P.R. China
Wei-Ying Ma  Microsoft Research, Asia, Beijing, P.R. China
Jinwen Ma  Peking University, Beijing, P. R. China
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 473,   Citation Count: 67
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008992.1009030
What is a DOI?

ABSTRACT

Organizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. In this paper, we reformalize the clustering problem as a salient phrase ranking problem. Given a query and the ranked list of documents (typically a list of titles and snippets) returned by a certain Web search engine, our method first extracts and ranks salient phrases as candidate cluster names, based on a regression model learned from human labeled training data. The documents are assigned to relevant salient phrases to form candidate clusters, and the final clusters are generated by merging these candidate clusters. Experimental results verify our method's feasibility and effectiveness.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
Google search engine, (2004) http://www.google.com.
 
5
Hastie T., Tibshirani R., and Friedman J. The Elements of Statistical Learning. New York: Springer-Verlag, 2001.
6
 
7
8
 
9
Lent B., Agrawal R., and Srikant R. Discovering Trends in Text Databases. In Proceedings of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD'97), Newport Beach, California, August 1997.
 
10
Leouski A. V. and Croft W. B. An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996.
 
11
Leuski A. and Allan J. Improving Interactive Retrieval by Combining Ranked List and Clustering. Proceedings of RIAO, College de France, pp. 665--681, 2000.
 
12
MSN search engine, (2004) http://search.msn.com.
 
13
Smola, A. J. and Schlkopf, B. A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series, NC2-TR-1998-030. October, 1998.
 
14
Vivisimo clustering engine, (2004) http://vivisimo.com.
 
15
Yahoo search engine, (2004) http://www.yahoo.com.
 
16
17

CITED BY  67

Collaborative Colleagues:
Hua-Jun Zeng: colleagues
Qi-Cai He: colleagues
Zheng Chen: colleagues
Wei-Ying Ma: colleagues
Jinwen Ma: colleagues