| Analysis of a very large web search engine query log |
| Full text |
Pdf
(706 KB)
|
| Source
|
ACM SIGIR Forum
archive
Volume 33 , Issue 1 (Fall 1999)
table of contents
Pages: 6 - 12
Year of Publication: 1999
ISSN:0163-5840
|
|
Authors
|
|
Craig Silverstein
|
Google Inc., 2400 Bayshore, Mountain View, CA
|
|
Hannes Marais
|
Compaq Systems Research, 130 Lytton Ave, Palo Alto, CA
|
|
Monika Henzinger
|
Google Inc., 2400 Bayshore, Mountain View, CA
|
|
Michael Moricz
|
Doublebill.Com, Inc., 1800 Bridge Parkway, Redwood City, CA 94065
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 46, Downloads (12 Months): 370, Citation Count: 154
|
|
|
ABSTRACT
In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such.
CITED BY 154
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peter Bruza , Robert McArthur , Simon Dennis, Interactive Internet search: keyword, directory and query reformulation mechanisms compared, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.280-287, July 24-28, 2000, Athens, Greece
|
|
|
|
|
|
|
|
|
Paricia Correia Saraiva , Edleno Silva de Moura , Novio Ziviani , Wagner Meira , Rodrigo Fonseca , Berthier Riberio-Neto, Rank-preserving two-level caching for scalable search engines, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.51-58, September 2001, New Orleans, Louisiana, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman, Using titles and category names from editor-driven taxonomies for automatic evaluation, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
Nick Craswell , Francis Crimmins , David Hawking , Alistair Moffat, Performance and cost tradeoffs in Web search, Proceedings of the fifteenth Australasian database conference, p.161-169, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Scott Nicholson , Tito Sierra , U. Yeliz Eseryel , Ji-Hong Park , Philip Barkow , Erika J. Pozo , Jane Ward, How much of it is real? Analysis of paid placement in Web search engine results, Journal of the American Society for Information Science and Technology, v.57 n.4, p.448-461, February 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel Gruhl , Daniel N. Meredith , Jan H. Pieper , Alex Cozzi , Stephen Dill, The web beyond popularity: a really simple system for web scale RSS, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
|
|
|
Ricardo Baeza-Yates , Aristides Gionis , Flavio P. Junqueira , Vanessa Murdock , Vassilis Plachouras , Fabrizio Silvestri, Design trade-offs for search engine caching, ACM Transactions on the Web (TWEB), v.2 n.4, p.1-28, October 2008
|
|
|
Qingqing Gan , Josh Attenberg , Alexander Markowetz , Torsten Suel, Analysis of geographic queries in a search engine log, Proceedings of the first international workshop on Location and the web, p.49-56, April 22-22, 2008, Beijing, China
|
|
|
Dou Shen , Rong Pan , Jian-Tao Sun , Jeffrey Junfeng Pan , Kangheng Wu , Jie Yin , Qiang Yang, Query enrichment for web-query classification, ACM Transactions on Information Systems (TOIS), v.24 n.3, p.320-352, July 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hemant Joshi , Shinya Ito , Santhosh Kanala , Sangeetha Hebbar , Coskun Bayrak, Concept set extraction with user session context, Proceedings of the 45th annual southeast regional conference, March 23-24, 2007, Winston-Salem, North Carolina
|
|
|
Dan Cosley , Dan Frankowski , Loren Terveen , John Riedl, SuggestBot: using intelligent task routing to help people find work in wikipedia, Proceedings of the 12th international conference on Intelligent user interfaces, January 28-31, 2007, Honolulu, Hawaii, USA
|
|
|
|
|
|
|
|
|
|
|
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Filip Radlinski , Geri Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search, ACM Transactions on Information Systems (TOIS), v.25 n.2, p.7-es, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hsieh-Hua Yang , Jui-Chen Yu , Hung-Jen Yang , Hsiao-Chih Lin, Developing a model of technology behavior intension on strategic web resource, Proceedings of the 10th WSEAS International Conference on APPLIED MATHEMATICS, p.71-76, November 01-03, 2006, Dallas, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dou Shen , Toby Walkery , Zijian Zhengy , Qiang Yangz , Ying Li, Personal name classification in web queries, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Edleno Silva de Moura , Celia Francisca dos Santos , Bruno Dos santos de Araujo , Altigran Soares da Silva , Pavel Calado , Mario A. Nascimento, Locality-Based pruning methods for web search, ACM Transactions on Information Systems (TOIS), v.26 n.2, p.1-28, March 2008
|
|
|
|
|
|
|
|
|
Bernard J. Jansen , Danielle L. Booth , Amanda Spink, Determining the informational, navigational, and transactional intent of Web queries, Information Processing and Management: an International Journal, v.44 n.3, p.1251-1266, May, 2008
|
|
|
|
|
|
|
|
|
Hao Ma , Haixuan Yang , Irwin King , Michael R. Lyu, Learning latent semantic relations from clickthrough data for query suggestion, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Maria-Elena Hernandez , Sean M. Falconer , Margaret-Anne Storey , Simona Carini , Ida Sim, Synchronized tag clouds for exploring semi-structured clinical trial data, Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, October 27-30, 2008, Ontario, Canada
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, A metric cache for similarity search, Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval, October 30-30, 2008, Napa Valley, California, USA
|
|
|
Yiqun Liu , Rongwei Cen , Min Zhang , Shaoping Ma , Liyun Ru, Identifying web spam with user behavior analysis, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
Yiqun Liu , Rongwei Cen , Min Zhang , Shaoping Ma , Liyun Ru, Identifying web spam with user behavior analysis, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fabrizio Falchi , Claudio Lucchese , Salvatore Orlando , Raffaele Perego , Fausto Rabitti, Caching content-based queries for robust and efficient image retrieval, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Maryam Kamvar , Melanie Kellar , Rajan Patel , Ya Xu, Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
Joel Brandt , Philip J. Guo , Joel Lewenstein , Mira Dontcheva , Scott R. Klemmer, Two studies of opportunistic programming: interleaving web foraging, learning, and writing code, Proceedings of the 27th international conference on Human factors in computing systems, April 04-09, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|