|
ABSTRACT
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function, mathematically analyze some of its properties, and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Arindam Banerjee , Inderjit S. Dhillon , Joydeep Ghosh , Suvrit Sra, Clustering on the Unit Hypersphere using von Mises-Fisher Distributions, The Journal of Machine Learning Research, 6, p.1345-1382, 9/1/2005
|
| |
3
|
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In The Third Text REtrieval Conference, pages 69--80, 1994.
|
| |
4
|
|
| |
5
|
|
| |
6
|
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
|
| |
7
|
I. S. Dhillon and S. Sra. Modeling data using directional distributions, 2003.
|
 |
8
|
Susan Dumais , John Platt , David Heckerman , Mehran Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the seventh international conference on Information and knowledge management, p.148-155, November 02-07, 1998, Bethesda, Maryland, United States
[doi> 10.1145/288627.288651]
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. S. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. In Advances in Neural Information Processing Systems (NIPS) 15, pages 657--664, 2002.
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In Advances in Neural Information Processing Systems (NIPS) 15, pages 1473--1480, 2002.
|
 |
19
|
Bienvenido Vélez , Ron Weiss , Mark A. Sheldon , David K. Gifford, Fast and effective query refinement, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.6-15, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
 |
20
|
|
CITED BY 35
|
|
|
|
|
Po Tun Wu , Yi Hsuan Yang , Kuan Ting Chen , Winston H. Hsu , Tien Hsu Li , Chun Jen Lee, Keyword-based concept search on consumer photos by web-based kernel function, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Denilson Alves Pereira , Berthier Ribeiro-Neto , Nivio Ziviani , Alberto H. F. Laender, Using web information for creating publication venue authority files, Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, June 16-20, 2008, Pittsburgh PA, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Claire Cardie , Cynthia Farina , Adil Aijaz , Matt Rawding , Stephen Purpura, A study in rule-specific issue categorization for e-rulemaking, Proceedings of the 2008 international conference on Digital government research, May 18-21, 2008, Montreal, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiang-Ming Yang , Rui Cai , Feng Jing , Shuo Wang , Lei Zhang , Wei-Ying Ma, Search-based query suggestion, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Andrei Broder , Massimiliano Ciaramita , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Vanessa Murdock , Vassilis Plachouras, To swing or not to swing: learning when (not) to advertise, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Gang Luo , Chunqiang Tang , Hao Yang , Xing Wei, MedSearch: a specialized search engine for medical information retrieval, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
Huanhuan Cao , Daxin Jiang , Jian Pei , Qi He , Zhen Liao , Enhong Chen , Hang Li, Context-aware query suggestion by mining click-through and session data, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Lance Riedel , Jeffrey Yuan, Online expansion of rare queries for sponsored search, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|