|
ABSTRACT
We introduce the notion of query substitution, that is, generating a new query to replace a user's original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
C.-C. Chang and C.-J. Lin. LIBSVM : A Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
|
| |
4
|
S. Cucerzan and E. Brill. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP 2004, pages 293--300, 2004.
|
| |
5
|
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
|
| |
6
|
L. Dumbgen. Pair-adjacent violators (PAV), available at http://www.math.mu-luebeck.de/workers/duembgen/software/software.html. In Statistical Software (MATLAB), 2000.
|
| |
7
|
|
| |
8
|
D. C. Fain and J. O. Pedersen. Sponsored search. In Bulletin of the American Society for Information Science and Technology, 2005.
|
| |
9
|
C. Fellbaum. WordNet: An Electronic Lexical Database. The MIT Press, 1998.
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schlkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.
|
 |
15
|
|
| |
16
|
K. M. Risvik, T. Mikolajewski, and P. Boros. Query segmentation for web search. In Poster Session in The Twelfth International World Wide Web Conference, 2003.
|
 |
17
|
|
| |
18
|
A. Spink, B. J. Jansen, and H. C. Ozmultu. Use of query reformulation and relevance feedback by Excite users. Internet Research: Electronic Networking Applications and Policy, 10(4):317--328, 2000.
|
 |
19
|
|
CITED BY 61
|
|
|
|
|
Rosie Jones , Ravi Kumar , Bo Pang , Andrew Tomkins, Vanity fair: privacy in querylog bundles, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Ben Carterette , Rosie Jones , Wiley Greiner , Cory Barr, N semantic classes are harder than two, Proceedings of the COLING/ACL on Main conference poster sessions, p.49-56, July 17-18, 2006, Sydney, Australia
|
|
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yunbo Cao , Huizhong Duan , Chin-Yew Lin , Yong Yu , Hsiao-Wuen Hon, Recommending questions using the mdl-based tree cut model, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
Paolo Boldi , Francesco Bonchi , Carlos Castillo , Debora Donato , Aristides Gionis , Sebastiano Vigna, The query-flow graph: model and applications, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Filip Radlinski , Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Optimizing relevance and revenue in ad search: a query substitution approach, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hao Ma , Haixuan Yang , Irwin King , Michael R. Lyu, Learning latent semantic relations from clickthrough data for query suggestion, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Azarakhsh Malekian , Chi-Chao Chang , Ravi Kumar , Grant Wang, Optimizing query rewrites for keyword-based advertising, Proceedings of the 9th ACM conference on Electronic commerce, July 08-12, 2008, Chicago, Il, USA
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Huanhuan Cao , Daxin Jiang , Jian Pei , Qi He , Zhen Liao , Enhong Chen , Hang Li, Context-aware query suggestion by mining click-through and session data, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Yi Hsuan Yang , Po Tun Wu , Ching Wei Lee , Kuan Hung Lin , Winston H. Hsu , Homer H. Chen, ContextSeer: context search and recommendation at query time for shared consumer photos, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Doug Downey , Susan Dumais , Dan Liebling , Eric Horvitz, Understanding the relationship between searchers' queries and information goals, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Gal Chechik , Eugene Ie , Martin Rehn , Samy Bengio , Dick Lyon, Large-scale content-based audio retrieval from text queries, Proceeding of the 1st ACM international conference on Multimedia information retrieval, October 30-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Paolo Boldi , Francesco Bonchi , Carlos Castillo , Debora Donato , Sebastiano Vigna, Query suggestions using query-flow graphs, Proceedings of the 2009 workshop on Web Search Click Data, p.56-63, February 09-09, 2009, Barcelona, Spain
|
|
|
Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Lance Riedel , Jeffrey Yuan, Online expansion of rare queries for sponsored search, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
Huanhuan Cao , Daxin Jiang , Jian Pei , Enhong Chen , Hang Li, Towards context-aware search by learning a very large variable length hidden markov model from search logs, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Doug Downey , Susan Dumais , Eric Horvitz, Models of searching and browsing: languages, studies, and applications, Proceedings of the 20th international joint conference on Artifical intelligence, p.2740-2747, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|