ACM Home Page
Please provide us with feedback. Feedback
Exploiting context analysis for combining multiple entity resolution systems
Full text PdfPdf (468 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 6: entity resolution table of contents
Pages 207-218  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Zhaoqi Chen  Microsoft Corporation, Redmond, USA
Dmitri V. Kalashnikov  University of California, Irvine, Irvine, CA, USA
Sharad Mehrotra  University of California, Irvine, Irvine, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 49,   Downloads (12 Months): 200,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559869
What is a DOI?

ABSTRACT

Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Artiles, J. Gonzalo, and S. Sekine. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. In SemEval, 2007.
2
 
3
4
5
6
7
8
9
10
 
11
H. Cunningham, D. Maynard, K. Bontcheva, and Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL'02.
12
 
13
E. Elmacioglu, Y.F. Tan, S. Yan, M.-Y. Kan, and D. Lee. PSNUS: Web people name disambiguation by simple clustering with rich features. In SemEval, 2007.
 
14
 
15
S. Garner. Weka: The waikato environment for knowledge analysis. In New Zealand Comput. Sci. Res. Conf., 1995.
 
16
 
17
S.T. Hadjitodorov and L.I. Kuncheva. Selecting diversifying heuristics for cluster ensembles. In Multiple Classifier Systems, 2007.
18
 
19
D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SIAM Data Mining, 2005.
 
20
 
21
22
 
23
D.V. Kalashnikov, S. Mehrotra, Z. Chen, R. Nuray-Turan, and N. Ashish. Disambiguation algorithm for people search on the web. In ICDE, 2007.
24
 
25
 
26
 
27
A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In NIPS, 2004.
28
 
29
R. Nuray-Turan, Z. Chen, D.V. Kalashnikov, and S. Mehrotra. Exploiting Web querying for Web People Search in WePS2. In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference, 2009.
 
30
R. Nuray-Turan, D.V. Kalashnikov, and S. Mehrotra. Self-tuning in graph-based reference disambiguation. In DASFAA, 2007.
 
31
32
 
33
W. Shen, P. DeRose, L. Vu, A. Doan, and R. Ramakrishnan. Source-aware entity matching: A compositional approach. In ICDE, 2007.
 
34
 
35
 
36
A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining partitionings. In Journal of Machine Learning Research, 2002.
37
 
38
A. Thor and E. Rahm. Moma -- a mapping-based object matching system. In CIDR, 2007.
 
39
 
40
 
41

Collaborative Colleagues:
Zhaoqi Chen: colleagues
Dmitri V. Kalashnikov: colleagues
Sharad Mehrotra: colleagues