|
ABSTRACT
Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar" column names in the schemas to be matched, or by recognizing common domains in the data stored in the schemas. While each of these approaches is valuable in many cases, they are not infallible, and there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are "opaque" or very difficult to interpret. In this paper we propose a two-step technique that works even in the presence of opaque column names and data values. In the first step, we measure the pair-wise attribute correlations in the tables to be matched and construct a dependency graph using mutual information as a measure of the dependency between attributes. In the second stage, we find matching node pairs in the dependency graphs by running a graph matching algorithm. We validate our approach with an experimental study, the results of which suggest that such an approach can be a useful addition to a set of (semi) automatic schema matching techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Domenico Beneventano , Sonia Bergamaschi , Silvana Castano , Alberto Corni , R. Guidetti , G. Malvezzi , Michele Melchiori , Maurizio Vincini, Information Integration: The MOMIS Project Demonstration, Proceedings of the 26th International Conference on Very Large Data Bases, p.611-614, September 10-14, 2000
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
AnHai Doan , Pedro Domingos , Alon Y. Halevy, Reconciling schemas of disparate data sources: a machine-learning approach, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.509-520, May 21-24, 2001, Santa Barbara, California, United States
|
| |
7
|
AnHai Doan, Pedro Domingos, Alon Y. Levy: Learning Source Description for Data Integration. WebDB (Informal Proceedings) 2000: 81--86
|
| |
8
|
Nir Friedman, Iftach Nachman, Dana Peer: Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm. UAI 1999: 206--215
|
| |
9
|
|
 |
10
|
Lise Getoor , Benjamin Taskar , Daphne Koller, Selectivity estimation using probabilistic models, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.461-472, May 21-24, 2001, Santa Barbara, California, United States
|
| |
11
|
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March, 1995 (revised November, 1996)
|
 |
12
|
Mauricio A. Hernández , Renée J. Miller , Laura M. Haas, Clio: a semi-automatic tool for schema mapping, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.607, May 21-24, 2001, Santa Barbara, California, United States
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Sergey Melnik, Hector Garcia-Molina, Erhard Rahm: Similarity Flooding: A Versatile Graph Matching Algorithm. ICDE 2002
|
| |
17
|
|
| |
18
|
|
| |
19
|
PKDD 2001 Discovery Challenge on Thrombosis Data. http://lisp.vse.cz/challenge/pkdd2001/
|
| |
20
|
|
| |
21
|
Triada, Ltd. http://www.triada.com/
|
| |
22
|
U.S. Census Bureau. Census data file ftp site. ftp://ftp2.census.gov/census_2000/datasets/
|
 |
23
|
Ling Ling Yan , Renée J. Miller , Laura M. Haas , Ronald Fagin, Data-driven understanding and refinement of schema mappings, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.485-496, May 21-24, 2001, Santa Barbara, California, United States
|
CITED BY 40
|
|
Robin Dhamankar , Yoonkyong Lee , AnHai Doan , Alon Halevy , Pedro Domingos, iMAP: discovering complex semantic matches between database schemas, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
Robert McCann , Bedoor AlShebli , Quoc Le , Hoa Nguyen , Long Vu , AnHai Doan, Mapping maintenance for data integration systems, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Roth , M. A. Hernandez , P. Coulthard , L. Yan , L. Popa , H. C.-T. Ho , C. C. Salter, XML Mapping technology: making connections in an XML-centric world, IBM Systems Journal, v.45 n.2, p.389-409, January 2006
|
|
|
|
|
|
Jie Tang , Juanzi Li , Bangyong Liang , Xiaotong Huang , Yi Li , Kehong Wang, Using Bayesian decision for ontology mapping, Web Semantics: Science, Services and Agents on the World Wide Web, v.4 n.4, p.243-262, December, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fengjun Li , Bo Luo , Peng Liu , Dongwon Lee , Chao-Hsien Chu, Automaton segmentation: a new approach to preserve privacy in xml information brokering, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. Bonifati , G. Mecca , A. Pappalardo , S. Raunich , G. Summa, Schema mapping verification: the spicy way, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|