ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Similarity-aware indexing for real-time entity resolution
Full text PdfPdf (387 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 18th ACM conference on Information and knowledge management table of contents
Hong Kong, China
POSTER SESSION: Poster session 3: IR track table of contents
Pages: 1565-1568  
Year of Publication: 2009
ISBN:978-1-60558-512-3
Authors
Peter Christen  Australian National University, Canberra, Australia
Ross Gayler  Veda Advantage, Melbourne, Australia
David Hawking  Funnelback Pty Ltd, Dickson, Australia
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 47,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1645953.1646173
What is a DOI?

ABSTRACT

Entity resolution, also known as data matching or record linkage, is the task of identifying and matching records from several databases that refer to the same entities. Traditionally, entity resolution has been applied in batch-mode and on static databases. However, many organisations are increasingly faced with the challenge of having large databases containing entities that need to be matched in real-time with a stream of query records also containing entities, such that the best matching records are retrieved. Example applications include online law enforcement and national security databases, public health surveillance and emergency response systems, financial verification systems, online retail stores, eGovernment services, and digital libraries.

A novel inverted index based approach for real-time entity resolution is presented in this paper. At build time, similarities between attribute values are computed and stored to support the fast matching of records at query time. The presented approach differs from other approaches to approximate query matching in that it allows any similarity comparison function, and any 'blocking' (encoding) function, both possibly domain specific, to be incorporated.

Experimental results on a real-world database indicate that the total size of all data structures of this novel index approach grows sub-linearly with the size of the database, and that it allows matching of query records in sub-second time, more than two orders of magnitude faster than a traditional entity resolution index approach. The interested reader is referred to the longer version of this paper [5].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
R. Baxter, P. Christen, and T. Churches. A comparison of fast blocking methods for record linkage. In ACM SIGKDD'03 Workshop on Data Cleaning, Record Linkage and Object Consolidation, Washington DC, 2003.
3
 
4
 
5
P. Christen, R. Gayler, and D. Hawking. Similarity-aware indexing for real-time entity resolution. Technical Report TR-CS-09-01, School of Computer Science, The Australian National University, Canberra, Australia, 2009.
6
7
 
8
9


Collaborative Colleagues:
Peter Christen: colleagues
Ross Gayler: colleagues
David Hawking: colleagues