|
ABSTRACT
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Batini, T. Catarci, and M. Scannapieco. A survey of data quality issues in cooperative information systems. Pre-conference ER tutorial, 2004.
|
 |
2
|
|
| |
3
|
|
CITED BY 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Moisés G. Carvalho , Albero H. F. Laender , Marcos André Gonçalves , Altigran S. da Silva, Replica identification using genetic programming, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
Pedro DeRose , Warren Shen , Fei Chen , AnHai Doan , Raghu Ramakrishnan, Building structured web community portals: a top-down, compositional, and incremental approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
Philippe Cudré-Mauroux , Parisa Haghani , Michael Jost , Karl Aberer , Hermann De Meer, idMesh: graph-based disambiguation of linked data, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
Moisés G. de Carvalho , Alberto H. F. Laender , Marcos André Gonçalves , Thiago C. Porto, The impact of parameter setup on a genetic programming approach to record deduplication, Proceedings of the 23rd Brazilian symposium on Databases, October 13-17, 2008, Campinas, Sao Paulo, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|