ACM Home Page
Please provide us with feedback. Feedback
Categorizing web queries according to geographical locality
Full text PdfPdf (546 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Information retrieval session 6: categorization table of contents
Pages: 325 - 333  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
Luis Gravano  Columbia University
Vasileios Hatzivassiloglou  Columbia University
Richard Lichtenstein  Harvard University
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 106,   Citation Count: 26
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956925
What is a DOI?

ABSTRACT

Web pages (and resources, in general) can be characterized according to their geographical locality. For example, a web page with general information about wildflowers could be considered a global page, likely to be of interest to a geographically broad audience. In contrast, a web page with listings on houses for sale in a specific city could be regarded as a local page, likely to be of interest only to an audience in a relatively narrow region. Similarly, some search engine queries (implicitly) target global pages, while other queries are after local pages. For example, the best results for query [wildflowers] are probably global pages about wildflowers such as the one discussed above. However, local pages that are relevant to, say, San Francisco are likely to be good matches for a query [houses for sale] that was issued by a San Francisco resident or by somebody moving to that city. Unfortunately, search engines do not analyze the geographical locality of queries and users, and hence often produce sub-optimal results. Thus query [wildflowers] might return pages that discuss wildflowers in specific U.S. states (and not general information about wildflowers), while query [houses for sale] might return pages with real estate listings for locations other than that of interest to the person who issued the query. Deciding whether an unseen query should produce mostly local or global pages---without placing this burden on the search engine users---is an important and challenging problem, because queries are often ambiguous or underspecify the information they are after. In this paper, we address this problem by first defining how to categorize queries according to their (often implicit) geographical locality. We then introduce several alternatives for automatically and efficiently categorizing queries in our scheme, using a variety of state-of-the-art machine learning tools. We report a thorough evaluation of our classifiers using a large sample of queries from a real web search engine, and conclude by discussing how our query categorization approach can help improve query result quality.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. M. Bates and D. G. Watts. Nonlinear Regression Analysis and its Applications. Wiley, New York, 1988.
2
 
3
A. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30 (7):1145--1159, 1998.
 
4
 
5
C. Buckley, J. Allan, G. Salton, and A. Singhal. Automatic query expansion using SMART: TREC 3. In Proceedings of the Third Text REtrieval Conference (TREC-3), pages 69--80, April 1995. NIST Special Publication 500-225.
 
6
O. Buyukkokten, J. Cho, H. Gracía-Molina, L. Gravano, and N. Shivakumar. Exploiting geographical location information of web pages. In Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99), June 1999.
 
7
 
8
W. W. Cohen. Learning trees and rules with set-valued functions. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, 1996.
 
9
10
 
11
 
12
 
13
14
 
15
M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning, Sept. 1997.
16
 
17
 
18
 
19
T. J. Santner and D. E. Duffy. The Statistical Analysis of Discrete Data. Springer-Verlag, New York, 1989.
 
20
 
21
G. M. Weiss and F. Provost. The effect of class distribution on classifier learning: An empirical study. Technical Report ML-TR-44, Computer Science Department, Rutgers University, Aug. 2001.

CITED BY  26

Collaborative Colleagues:
Luis Gravano: colleagues
Vasileios Hatzivassiloglou: colleagues
Richard Lichtenstein: colleagues