ACM Home Page
Please provide us with feedback. Feedback
Establishing value mappings using statistical models and user feedback
Full text PdfPdf (320 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 14th ACM international conference on Information and knowledge management table of contents
Bremen, Germany
SESSION: Paper session KM-1 (knowledge management): knowledge systems table of contents
Pages: 68 - 75  
Year of Publication: 2005
ISBN:1-59593-140-6
Authors
Jaewoo Kang  North Carolina State University, Raleigh, NC
Tae Sik Han  North Carolina State University, Raleigh, NC
Dongwon Lee  Pennsylvania State University, University Park, PA
Prasenjit Mitra  Pennsylvania State University, University Park, PA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 32,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1099554.1099569
What is a DOI?

ABSTRACT

In this paper, we present a "value mapping" algorithm that does not rely on syntactic similarity or semantic interpretation of the values. The algorithm first constructs a statistical model (e.g., co-occurrence frequency or entropy vector) that captures the unique characteristics of values and their co-occurrence. It then finds the matching values by computing the distances between the models while refining the models using user feedback through iterations. Our experimental results suggest that our approach successfully establishes value mappings even in the presence of opaque data values and thus can be a useful addition to the existing data integration techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. "Eliminating Fuzzy Duplicates in Data Warehouses". In VLDB, 2002.
2
3
4
5
 
6
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. "Indexing by Latent Semantic Analysis". J. of the American Society of Information Science, 41(6):391--407, 1990.
7
 
8
9
 
10
A. Doan, Y. Lu, Y. Lee, and J. Han. "Object Matching for Data Integration: A Profile-Based Approach". In Workshop on Info. Integration on the Web, 2003.
 
11
I. P. Fellegi and A. B. Sunter. "A Theory for Record Linkage". J. of the American Statistical Society, 64:1183--1210, 1969.
 
12
H. Galhardas, D. Florescu, D. Shasha, and E. Simon. "An Extensible Framework for Data Cleaning". In IEEE ICDE, 2000.
 
13
L. Gravano, P. G. Ipeirotis, N. Koudas, and D. Srivastava. "Text Joins for Data Cleansing and Integration in an RDBMS". In IEEE ICDE, 2003.
14
15
16
 
17
W.-S. Li and C. Clifton. "SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases using Neural Networks". VLDB J., 10(4), Dec. 2001.
 
18
19
 
20
S. Melnik, H. Garcia-Molina, and E. Rahm. "Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching". In IEEE ICDE, 2002.
 
21
 
22
 
23
D. S. Moore and G. P. McCabe. "Introduction to the Practice of Statistics". From Book News, Inc., 1998.
 
24
H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser. "Identity Uncertainty and Citation Matching". In Advances in Neural Information Processing Systems. MIT Press, 2003.
 
25
 
26
S. Sarawagi and A. Bhamidipaty. "Interactive Deduplication using Active Learning". In ACM SIGMOD, 2002.
 
27
 
28
W. E. Winkler. "The State of Record Linkage and Current Research Problems". Technical report, US Bureau of the Census, Apr. 1999.


Collaborative Colleagues:
Jaewoo Kang: colleagues
Tae Sik Han: colleagues
Dongwon Lee: colleagues
Prasenjit Mitra: colleagues