ACM Home Page
Please provide us with feedback. Feedback
Relational clustering for multi-type entity resolution
Full text PdfPdf (299 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 4th international workshop on Multi-relational mining table of contents
Chicago, Illinois
Pages: 3 - 12  
Year of Publication: 2005
ISBN:1-59593-212-7
Authors
Indrajit Bhattacharya  University of Maryland, College Park, MD
Lise Getoor  University of Maryland, College Park, MD
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 67,   Citation Count: 5
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1090193.1090195
What is a DOI?

ABSTRACT

In many applications, there are a variety of ways of referring to the same underlying entity. Given a collection of references to entities, we would like to determine the set of true underlying entities and map the references to these entities. The references may be to entities of different types and more than one type of entity may need to be resolved at the same time. We propose similarity measures for clustering references taking into account the different relations that are observed among the typed references. We pose typed entity resolution in relational data as a clustering problem and present experimental results on real data showing improvements over attribute-based models when relations are leveraged.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In VLDB, 2002.
 
2
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, 2002.
3
 
4
I. Bhattacharya and L. Getoor. A latent dirichlet model for entity resolution. Technical report, University of Maryland, College Park, 2005.
5
6
 
7
W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IJCAI-2003 Workshop on Information Integration on the Web, 2003.
8
9
 
10
W. Emde and D. Wettschereck. Relational instance based learning. In L. Saitta, editor, Proceedings of The 13th International Conference on Machine Learning, pages 122 -- 130. Morgan Kaufmann Publishers, 1996.
 
11
I. P. Fellegi and A. B. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64:1183--1210, 1969.
12
13
 
14
D. V. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SIAM SDM, Newport Beach, CA, USA, April 21--23 2005.
 
15
 
16
A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In NIPS, 2004.
17
 
18
B. Milch, B. Marthi, D. Sontag, S. Russell, D. L. Ong, and A. Kolobov. Blog: Probabilistic models with unknown objects. In IJCAI, 2005.
 
19
A. E. Monge and C. P. Elkan. The field matching problem: Algorithms and applications. In KDD, 1996.
 
20
A. E. Monge and C. P. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In DMKD, 1997.
21
 
22
J. Neville, M. Adler, and D. Jensen. Clustering relational data using attribute and link information. In Text Mining and Link Analysis Workshop, IJCAI, 2003.
 
23
Parag and P. Domingos. Multi-relational record linkage. In KDD Workshop on Multi-Relational Data Mining, 2004.
 
24
H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. Identity uncertainty and citation matching. In NIPS, 2003.
 
25
 
26
27
 
28
 
29
W. E. Winkler. The state of record linkage and current research problems. Technical report, U.S. Census Bureau, 1999.
 
30
W. E. Winkler. Methods for record linkage and Bayesian networks. Technical report, U.S. Census Bureau, 2002.

Collaborative Colleagues:
Indrajit Bhattacharya: colleagues
Lise Getoor: colleagues