|
ABSTRACT
Information and specifically Web pages may be organized, indexed, searched, and navigated using various metadata aspects, such as keywords, categories (themes), and also space. While categories and keywords are up for interpretation, space represents an unambiguous aspect to structure information. The basic problem of providing spatial references to content is solved by geocoding; a task that relates identifiers in texts to geographic co-ordinates. This work presents a methodology for the semiautomatic geocoding of persistent Web pages in the form of collaborative human intervention to improve on automatic geocoding results. While focusing on the Greek language and related Web pages, the developed techniques are universally applicable. The specific contributions of this work are (i) automatic geocoding algorithms for phone numbers, addresses and place name identifiers and (ii) a Web browser extension providing a map-based interface for manual geocoding and updating the automatically generated results. With the geocoding of a Web page being stored as respective annotations in a central repository, this overall mechanism is especially suited for persistent Web pages such as Wikipedia. To illustrate the applicability and usefulness of the overall approach, specific geocoding examples of Greek Web pages are presented.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. E. Axelrod. On Building a High Performance Gazetteer Database. Technical Report, MetaCarta, electronically available at http://www.metacarta.com/Collateral/Documents/English- US/Building-high-performance-gazetteer-Axelrod.pdf. Current as of June 2008.
|
| |
3
|
|
 |
4
|
Karla A. V. Borges , Alberto H. F. Laender , Claudia B. Medeiros , Clodoveu A. Davis, Jr., Discovering geographic locations in web pages using urban addresses, Proceedings of the 4th ACM workshop on Geographical information retrieval, November 09-09, 2007, Lisbon, Portugal
[doi> 10.1145/1316948.1316957]
|
| |
5
|
|
| |
6
|
A. Chalamandaris, A. Protopapas, P. Tsiakoulis, S. Raptis. All Greek to me! An Automatic Greeklish to Greek Transliteration System. In Proc. 5th Int'l Conf. on Language Resources and Evaluation (LREC), 2006.
|
| |
7
|
J. Cowan. TagSoup parser. http://home.ccil.org/~cowan/XML/tagsoup/. Web page, current as of June 2008.
|
| |
8
|
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proc. 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), 2002.
|
| |
9
|
P. DeRose, X. Chai, B. J. Gao, W. Shen, A. Doan, P. Bohannon, X. Zhu. Building Community Wikipedias: A Machine-Human Partnership Approach. In Proc. ICDE, pages 646--655, 2008.
|
| |
10
|
|
| |
11
|
R. Elsinga. www.elsinga.org. Web page, current as of June 2008.
|
| |
12
|
Explore Our Pla. Net. RSS to GeoRSS Converter. Web page http://exploreourpla.net/2006-06-08/georss-feed-readershows-podcasts.html, current as of June 2008.
|
| |
13
|
H. Foundalis. The Details of Modern Greek Phonetics and Phonology. Web page http://www.cogsci.indiana.edu/farg/harry/lan/grphdetl.htm, current as of June 2008.
|
 |
14
|
|
| |
15
|
M. Gilleland. Levenshtein Distance, in Three Flavors, http://www.merriampark.com/ld.htm, 2000.
|
| |
16
|
Google Inc. Google Maps API. http://code.google.com/apis/maps/. Web page, current as of June 2008.
|
| |
17
|
Luis Gravano , Panagiotis G. Ipeirotis , H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Approximate String Joins in a Database (Almost) for Free, Proceedings of the 27th International Conference on Very Large Data Bases, p.491-500, September 11-14, 2001
|
 |
18
|
|
| |
19
|
G. Klein, S. Rowe, and R. Décamps. JFlex - The Fast Scanner Generator for Java. http://jflex.de/. Web page, current as of June 2008.
|
| |
20
|
A. J. Lait, B. Randell. An Assessment of Name Matching Algorithms, Technical Report, Dept. of Comp. Sci., University of Newcastle upon Tyne, 1993
|
 |
21
|
Mong Li Lee , Tok Wang Ling , Wai Lup Low, IntelliClean: a knowledge-based intelligent data cleaner, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.290-294, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347154]
|
| |
22
|
MetaCarta Inc. Company homepage. http://www.metacarta.com/, Web page, current as of June 2008.
|
 |
23
|
|
| |
24
|
NGA. GEOnet Names Server (GNS). http://earth-info.nga.mil/gns/html/index.html. Web page, current as of June 2008.
|
| |
25
|
G. Petasis, G. Paliouras, V. Karkaletsis, C. Spyropoulos, I. Androutsopoulos. Resolving Part-Of-Speech Ambiguity in the Greek Language Using Learning Techniques. In Proc. CoRR, 1999.
|
| |
26
|
|
| |
27
|
E. Rahm, H. H. Do, Data Cleaning: Problems and Current Approaches, IEEE Bulletin on Data Engineering, vol 23(4), pages 3--13, 2000.
|
| |
28
|
K. Sgarbas, N. Fakotakis, G. Kokkinakis, A PC-KIMMO-Based Bi-directional Graphemic/Phonetic Converter for Modern Greek, Literary & Linguistic Computing, Oxford University Press, vol 13(2), pages 65--75, 1998.
|
| |
29
|
R. Waters. Way to go? Mapping looks to be the Web's next big thing. Financial Times, May 22, 2008.
|
| |
30
|
Yahoo Inc. Yahoo Yellow Pages. http://yp.yahoo.com/. Web page current as of June 2008.
|
|