ACM Home Page
Please provide us with feedback. Feedback
Web-a-where: geotagging web content
Full text PdfPdf (199 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Sheffield, United Kingdom
SESSION: Disambiguation table of contents
Pages: 273 - 280  
Year of Publication: 2004
ISBN:1-58113-881-4
Authors
Einat Amitay  IBM Haifa Research Lab, Haifa, Israel
Nadav Har'El  IBM Haifa Research Lab, Haifa, Israel
Ron Sivan  IBM Haifa Research Lab, Haifa, Israel
Aya Soffer  IBM Haifa Research Lab, Haifa, Israel
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 239,   Citation Count: 52
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008992.1009040
What is a DOI?

ABSTRACT

We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses.Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario.An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Google Search by Location http://labs.google.com/location.
 
2
ISO 3166 code lists. http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/index.html.
 
3
MεταCARTA, Inc. 875 Massachusetts Avenue, Cambridge, MA 02139. http://www.metacarta.com.
 
4
ODP: Regional. http://dmoz.org/regional.
 
5
Text REtrieval Conference 2003: .gov test collection. http://es.cmis.csiro.au/trecweb/access_to_data.html.
 
6
United Nations department of economic and social affairs. http://unstats.un.org/unsd.
 
7
USGS Geographic Names Information System (GNIS). http://geonames.usgs.gov.
 
8
WebFountain framework for data mining. http://www.almaden.ibm.com/webfountain.
 
9
World Gazetteer. http://www.world-gazetteer.com.
 
10
The 6th message understanding conference task definition, March 1995. http://www.cs.nyu.edu/cs/faculty/grishman/COtask21.book_1.html.
 
11
Language-independent named entity recognition: shared task, 2002. http://cnts.uia.ac.be/conll2002/ner.
 
12
 
13
 
14
 
15
 
16
G. Eriksson, K. Franzén, F. Olsson, L. Asker, and P. Lidén. Exploiting syntax when detecting protein names in text. In Proceedings of Workshop on Natural Language Processing in Biomedical Applications, 2002.
 
17
 
18
 
19
 
20
21
 
22
 
23
 
24
 
25
Y. Ravin and N. Wacholder. Extracting names from natural-language text. Technical Report RC-20338, IBM Research Division, T.J.Watson, Yorktown Heights, NY, October 1997.
 
26
 
27
 
28
 
29

CITED BY  52


REVIEW

"Wei Tang : Reviewer"

Location-assisted search has been gaining momentum recently. For example, Google has introduced a new service called "Search by Location." (Other search engines offer similar services, for example, Gigablast.com and local-news.net.) However, there  more...

Collaborative Colleagues:
Einat Amitay: colleagues
Nadav Har'El: colleagues
Ron Sivan: colleagues
Aya Soffer: colleagues