|
ABSTRACT
Web pages (and resources, in general) can be characterized according to their geographical locality. For example, a web page with general information about wildflowers could be considered a global page, likely to be of interest to a geographically broad audience. In contrast, a web page with listings on houses for sale in a specific city could be regarded as a local page, likely to be of interest only to an audience in a relatively narrow region. Similarly, some search engine queries (implicitly) target global pages, while other queries are after local pages. For example, the best results for query [wildflowers] are probably global pages about wildflowers such as the one discussed above. However, local pages that are relevant to, say, San Francisco are likely to be good matches for a query [houses for sale] that was issued by a San Francisco resident or by somebody moving to that city. Unfortunately, search engines do not analyze the geographical locality of queries and users, and hence often produce sub-optimal results. Thus query [wildflowers] might return pages that discuss wildflowers in specific U.S. states (and not general information about wildflowers), while query [houses for sale] might return pages with real estate listings for locations other than that of interest to the person who issued the query. Deciding whether an unseen query should produce mostly local or global pages---without placing this burden on the search engine users---is an important and challenging problem, because queries are often ambiguous or underspecify the information they are after. In this paper, we address this problem by first defining how to categorize queries according to their (often implicit) geographical locality. We then introduce several alternatives for automatically and efficiently categorizing queries in our scheme, using a variety of state-of-the-art machine learning tools. We report a thorough evaluation of our classifiers using a large sample of queries from a real web search engine, and conclude by discussing how our query categorization approach can help improve query result quality.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. M. Bates and D. G. Watts. Nonlinear Regression Analysis and its Applications. Wiley, New York, 1988.
|
 |
2
|
Bernhard E. Boser , Isabelle M. Guyon , Vladimir N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, p.144-152, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130401]
|
| |
3
|
A. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30 (7):1145--1159, 1998.
|
| |
4
|
|
| |
5
|
C. Buckley, J. Allan, G. Salton, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Proceedings of the Third Text REtrieval Conference (TREC-3), pages 69--80, April 1995. NIST Special Publication 500-225.
|
| |
6
|
O. Buyukkokten, J. Cho, H. Gracía-Molina, L. Gravano, and N. Shivakumar. Exploiting geographical location information of web pages. In Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99), June 1999.
|
| |
7
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
8
|
W. W. Cohen. Learning trees and rules with set-valued functions. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, 1996.
|
| |
9
|
|
 |
10
|
Gary W. Flake , Eric J. Glover , Steve Lawrence , C. Lee Giles, Extracting query modifications from nonlinear SVMs, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511488]
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning, Sept. 1997.
|
 |
16
|
Christopher B. Jones , R. Purves , A. Ruas , M. Sanderson , M. Sester , M. van Kreveld , R. Weibel, Spatial information retrieval and geographical ontologies an overview of the SPIRIT project, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564457]
|
| |
17
|
|
| |
18
|
|
| |
19
|
T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.
|
| |
20
|
|
| |
21
|
G. M. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR-44, Computer Science Department, Rutgers University, Aug. 2001.
|
CITED BY 26
|
|
|
|
|
Seikyung Jung , Kevin Harris , Janet Webster , Jonathan L. Herlocker, SERF: integrating human recommendations with search, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
Lee Wang , Chuang Wang , Xing Xie , Josh Forman , Yansheng Lu , Wei-Ying Ma , Ying Li, Detecting dominant locations from search queries, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
Steven M. Beitzel , Eric C. Jensen , Ophir Frieder , David Grossman , David D. Lewis , Abdur Chowdhury , Aleksandr Kolcz, Automatic web query classification using labeled and unlabeled training data, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
Yinghua Zhou , Xing Xie , Chuang Wang , Yuchang Gong , Wei-Ying Ma, Hybrid index structures for location-based web search, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
Dou Shen , Jian-Tao Sun , Qiang Yang , Zheng Chen, Building bridges for web query classification, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Qingqing Gan , Josh Attenberg , Alexander Markowetz , Torsten Suel, Analysis of geographic queries in a search engine log, Proceedings of the first international workshop on Location and the web, p.49-56, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
Yumao Lu , Fuchun Peng , Xin Li , Nawaaz Ahmed, Coupling feature selection and machine learning methods for navigational query identification, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Andrei Z. Broder , Marcus Fontoura , Evgeniy Gabrilovich , Amruta Joshi , Vanja Josifovski , Tong Zhang, Robust classification of rare queries using web knowledge, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Z. Broder , Peter Ciccolo , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Search advertising using web relevance feedback, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Dou Shen , Min Qin , Weizhu Chen , Qiang Yang , Zheng Chen, Mining web query hierarchies from clickthrough data, Proceedings of the 22nd national conference on Artificial intelligence, p.341-346, July 22-26, 2007, Vancouver, British Columbia, Canada
|
|
|
Evgeniy Gabrilovich , Andrei Broder , Marcus Fontoura , Amruta Joshi , Vanja Josifovski , Lance Riedel , Tong Zhang, Classifying search queries using the Web as a source of knowledge, ACM Transactions on the Web (TWEB), v.3 n.2, p.1-28, April 2009
|
|
|
|
|