| Entity resolution in geospatial data integration |
| Full text |
Pdf
(834 KB)
|
| Source
|
Geographic Information Systems
archive
Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
table of contents
Arlington, Virginia, USA
SESSION: Data integration
table of contents
Pages: 83 - 90
Year of Publication: 2006
ISBN:1-59593-529-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 96, Citation Count: 2
|
|
|
ABSTRACT
Due to the growing availability of geospatial data from a wide variety of sources, there is a pressing need for robust, accurate and automatic merging and matching techniques. Geospatial Entity Resolution is the process of determining, from a collection of database sources referring to geospatial locations, a single consolidated collection of 'true' locations. At the heart of this process is the problem of determining when two locations references match---i.e., when they refer to the same underlying location. In this paper, we introduce a novel method for resolving location entities in geospatial data. A typical geospatial database contains heterogeneous features such as location name, spatial coordinates, location type and demographic information. We investigate the use of all of these features in algorithms for geospatial entity resolution. Entity resolution is further complicated by the fact that the different sources may use different vocabularies for describing the location types and a semantic mapping is required. We propose a novel approach which learns how to combine the different features to perform accurate resolutions. We present experimental results showing that methods combining spatial and non-spatial features (e.g., location-name, location-type, etc.) together outperform methods based on spatial or name information alone.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Beeri, Y. Kanza, E. Safra, and Y. Sagiv. Object fusion in geographic information systems. In International Conference on Very Large Data Bases 2004.
|
 |
2
|
|
 |
3
|
Ching-Chien Chen , Craig A. Knoblock , Cyrus Shahabi , Yao-Yi Chiang , Snehal Thakkar, Automatically and accurately conflating orthoimagery and street maps, Proceedings of the 12th annual ACM international workshop on Geographic information systems, November 12-13, 2004, Washington DC, USA
[doi> 10.1145/1032222.1032231]
|
| |
4
|
Y. Doytsher and S. Filin. The detection of corresponding objects in a linear-based map confliation. Surveying and Land Information Systems 60(2):117--128, 2000.
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
A. McCallum and B. Wellner. Conditional models of identity uncertainity with application to noun conference. In Neural Information Processing Systems Conference 2004.
|
| |
10
|
|
| |
11
|
M. Minami. Using arcmap, 2000.Enviornmental Systems Research Institute, Inc.
|
| |
12
|
H. Newcombe, J. Kennedy, S. Axford, and A. James. Automatic linkage of vital records. Science 130:954--959, 1959.
|
| |
13
|
Parag and P.Domingos. Multi-relational record linkage. In ACM SIGKDD Workshop on Multi-Relational Data Mining 2004.
|
| |
14
|
A. Saafeld. Conflation-automated map compilation. International Journal of Geographical Information Systems 2(3):217--228, 1988.
|
| |
15
|
A. Samal, S. Seth, and K. Cueto. A feature based approach to conflation of geospatial sources. International Journal of Geographical Information Systems 18(00):1--31, 2004.
|
| |
16
|
W. E. Winkler. Methods for record linkage and bayesian networks. Technical report, Statistical Research Division, U. S. Census Bureau, Washington, DC, 2002.
|
|