|
ABSTRACT
In this paper, we describe a methodology to estimate the geographic coverage of the web without the need for secondary knowledge or complex geo-tagging. This is achieved by randomly selecting toponyms from the Ordnance Survey 50K gazetteer to create search queries and thus gather document counts from various web sources for Great Britain. The same gazetteer is then used to geo-code the results and enable mapping. To validate our approach, and demonstrate the effects of geo/non-geo and geo/geo ambiguity, we mapped the selected toponyms to Geograph, a community project that contains user generated geo-tagged photographs of the UK. Although success varies with resolution, the proposed approach is likely sufficient to be reliably used by applications exploring the geographic coverage of the web for cases where references to settlements are likely to be common. In our case, we applied the method to produce maps of web coverage for a range of sources at a resolution of 30km.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Brunner, T. (2008), 'Geographic Information Retrieval: Identifikation der geographischen Lage von Zeitungsartikeln', Master's thesis, Geographisches Institut.
|
| |
4
|
Census, General Register Office for Scotland, Census: Standard Area Statistics (Scotland) {Computer File}. 2001, ESRC/JISC Census Programme, Census Dissemination Unit, MIMAS (University of Manchester).
|
| |
5
|
Census, Office for National Statistics, Census: Standard Area Statistics (England and Wales) {Computer File}. 2001, ESRC/JISC Census Programme, Census Dissemination Unit, MIMAS (University of Manchester).
|
| |
6
|
Chakrabrati, S., Mining the Web: Analysis of Hypertext and Semi Structured Data. 2002: Morgan Kaufmann.
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
Goodchild, M. F., Citizens as Sensors: The World of Volunteered Geography. GeoJournal, 2007. 69(4): p. 211--221.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Larson, R. Geographic Information Retrieval and Spatial Browsing. in Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information. 1996: Urbana-Champaign: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign.
|
| |
19
|
|
| |
20
|
Lin, J. and A. Halavais, Geographical Distribution of Blogs in the United States. Webeology, 2006. 3(4).
|
| |
21
|
Markowetz, A., T. Brinkhoff, and B. Seeger. Geographic Information Retrieval. in 3rd International Workshop on Web Dynamics {online: http://dbs.mathematik.uni-marburg.de/publications/myPapers/2004/WebDyn2004.pdf}. 2004.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Purves, R., P. Clough, and H. Joho. Identifying Imprecise Regions for Geographic Information Retrieval Using the Web. in GISRUK 2005 - 13th Annual Conference on GIS Research UK. 2005.
|
| |
26
|
|
| |
27
|
|
| |
28
|
Sanderson, M. and J. Kohler. Analyzing Geographic Queries. in SIGIR 2004 - Workshop on Geographic Information Retrieval. 2004.
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
Tezuka, T. and K. Tanaka. Landmark Extraction: A Web Mining Approach. in COSIT 2005 - Conference on Spatial Information Theory. 2005.
|
| |
33
|
Tobler, W. R. (1979), 'Smooth Pycnophylactic Interpolation for Geographical Regions', Journal of the American Statistical Association 74(367), 519--530.
|
| |
34
|
Zook, M., The Geographies of the Internet, in Annual Review of Information Science and Technology, B. Cronin, Editor. 2005. p. 53--78.
|
|