|
ABSTRACT
Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Artiles, J. Gonzalo, and S. Sekine. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. In SemEval, 2007.
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
H. Cunningham, D. Maynard, K. Bontcheva, and Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL'02.
|
 |
12
|
|
| |
13
|
E. Elmacioglu, Y.F. Tan, S. Yan, M.-Y. Kan, and D. Lee. PSNUS: Web people name disambiguation by simple clustering with rich features. In SemEval, 2007.
|
| |
14
|
|
| |
15
|
S. Garner. Weka: The waikato environment for knowledge analysis. In New Zealand Comput. Sci. Res. Conf., 1995.
|
| |
16
|
|
| |
17
|
S.T. Hadjitodorov and L.I. Kuncheva. Selecting diversifying heuristics for cluster ensembles. In Multiple Classifier Systems, 2007.
|
 |
18
|
|
| |
19
|
D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SIAM Data Mining, 2005.
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
D.V. Kalashnikov, S. Mehrotra, Z. Chen, R. Nuray-Turan, and N. Ashish. Disambiguation algorithm for people search on the web. In ICDE, 2007.
|
 |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In NIPS, 2004.
|
 |
28
|
Andrew McCallum , Kamal Nigam , Lyle H. Ungar, Efficient clustering of high-dimensional data sets with application to reference matching, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.169-178, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347123]
|
| |
29
|
R. Nuray-Turan, Z. Chen, D.V. Kalashnikov, and S. Mehrotra. Exploiting Web querying for Web People Search in WePS2. In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference, 2009.
|
| |
30
|
R. Nuray-Turan, D.V. Kalashnikov, and S. Mehrotra. Self-tuning in graph-based reference disambiguation. In DASFAA, 2007.
|
| |
31
|
|
 |
32
|
|
| |
33
|
W. Shen, P. DeRose, L. Vu, A. Doan, and R. Ramakrishnan. Source-aware entity matching: A compositional approach. In ICDE, 2007.
|
| |
34
|
|
| |
35
|
|
| |
36
|
A. Strehl and J. Ghosh. Cluster ensembles: A knowledge reuse framework for combining partitionings. In Journal of Machine Learning Research, 2002.
|
 |
37
|
|
| |
38
|
A. Thor and E. Rahm. Moma -- a mapping-based object matching system. In CIDR, 2007.
|
| |
39
|
|
| |
40
|
|
| |
41
|
|
|