| A comparison of methods for the automatic identification of locations in wikipedia |
| Full text |
Pdf
(144 KB)
|
Source
|
Workshop On Geographic Information Retrieval
archive
Proceedings of the 4th ACM workshop on Geographical information retrieval
table of contents
Lisbon, Portugal
SESSION: Mining geographic information and GIR applications
table of contents
Pages 89-92
Year of Publication: 2007
ISBN:978-1-59593-828-2
|
|
Authors
|
|
Davide Buscaldi
|
Universidad Politécnica de Valencia, Valencia, Spain
|
|
Paolo Rosso
|
Universidad Politécnica de Valencia, Valencia, Spain
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 84, Citation Count: 0
|
|
|
ABSTRACT
In this paper we compare two methods for the automatic identification of geographical articles in encyclopedic resources such as Wikipedia. The methods are a WordNet-based method that uses a set of keywords related to geographical places, and a multinomial Naïve Bayes classificator, trained over a randomly selected subset of the English Wikipedia. This task may be included into the broader task of Named Entity classification, a well-known problem in the field of Natural Language Processing. The experiments were carried out considering both the full text of the articles and only the definition of the entity being described in the article. The obtained results show that the information contained in the page templates and the category labels is more useful than the text of the articles.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Buscaldi, P. Rosso, and P. Peris. Inferring geographical ontologies from multiple resources for geographical information retrieval. In C. Jones and R. Purves, editors, Proceedings of 3rd SIGIR Workshop on Geographical Information Retrieval, August 2006.
|
| |
2
|
D. Buscaldi, P. Rosso, and E. Sanchis. Wordnet as a geographical information resource. In Proceedings of the 3rd Global WordNet Association (GWA06), 2006.
|
| |
3
|
S. Cucerzan. Large scale named entity disambiguation based on wikipedia data. In The EMNLP-CoNLL Joint Conference, 2007.
|
 |
4
|
|
| |
5
|
G. Fu, C. B. Jones, and A. I. Abdelmoty. Bulding a geographical ontology for intelligent spatial search on the web. In Proceedings of the IASTED International Conference on Databases and Applications, 2005.
|
 |
6
|
Christopher B. Jones , R. Purves , A. Ruas , M. Sanderson , M. Sester , M. van Kreveld , R. Weibel, Spatial information retrieval and geographical ontologies an overview of the SPIRIT project, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564457]
|
| |
7
|
B. Martins, M. Chaves, and M. J. Silva. Assigning geographical scopes to web pages. In Advances in Information Retrieval, volume 3408 of Lecture Notes in Computer Science, pages 564--567. Springer, Berlin, 2005.
|
 |
8
|
|
| |
9
|
S. Overell and S. Rüger. Identifying and grounding descriptions of places. In C. Jones and R. Purves, editors, Proceedings of the 3rd SIGIR Workshop on Geographic Information Retrieval, pages 14--16, August 2006.
|
| |
10
|
D. Pinto, H. Jiménez-Salazar, P. Rosso, and E. Sanchis. Buap-upv tpirs: A system for document indexing reduction at webclef. In S. Verlag, editor, Accessing Multilingual Information Repositories, Revised Selected Papers CLEF05, volume 4022, pages 873--879, 2006.
|
| |
11
|
|
| |
12
|
M. Sanderson and J. Kohler. Analyzing geographic queries. In Proceedings of the 1st SIGIR Workshop on Geographic Information Retrieval, 2004.
|
|