ACM Home Page
Please provide us with feedback. Feedback
A web-based kernel function for measuring the similarity of short text snippets
Full text PdfPdf (199 KB)
Source International World Wide Web Conference archive
Proceedings of the 15th international conference on World Wide Web table of contents
Edinburgh, Scotland
SESSION: Web mining with search engines table of contents
Pages: 377 - 386  
Year of Publication: 2006
ISBN:1-59593-323-9
Authors
Mehran Sahami  Google Inc, Mountain View, CA
Timothy D. Heilman  Google Inc, Mountain View, CA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 244,   Citation Count: 35
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1135777.1135834
What is a DOI?

ABSTRACT

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function, mathematically analyze some of its properties, and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Arindam Banerjee , Inderjit S. Dhillon , Joydeep Ghosh , Suvrit Sra, Clustering on the Unit Hypersphere using von Mises-Fisher Distributions, The Journal of Machine Learning Research, 6, p.1345-1382, 9/1/2005
 
3
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In The Third Text REtrieval Conference, pages 69--80, 1994.
 
4
 
5
 
6
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
 
7
I. S. Dhillon and S. Sra. Modeling data using directional distributions, 2003.
8
9
 
10
 
11
 
12
J. S. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. In Advances in Neural Information Processing Systems (NIPS) 15, pages 657--664, 2002.
13
14
 
15
 
16
17
 
18
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In Advances in Neural Information Processing Systems (NIPS) 15, pages 1473--1480, 2002.
19
20

CITED BY  35

Collaborative Colleagues:
Mehran Sahami: colleagues
Timothy D. Heilman: colleagues