|
ABSTRACT
Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate* (1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
C. Clifton, E. Housman, and A. Rosenthal. Experience with a combined approach to attribute-matching across heterogeneous databases. In Proc. of the IFIP Working Conference on Data Semantics (DS-7), 1997.
|
| |
4
|
|
 |
5
|
Tamraparni Dasu , Theodore Johnson , S. Muthukrishnan , Vladislav Shkapenyuk, Mining database structure; or, how to build a data quality browser, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564719]
|
| |
6
|
R. Dhamankar. Semi-automated discovery of matches between schemas, ontologies, and data fragments of disparate data sources. M. S. Thesis, Dept. of CS, Univ. of Illinois. To appear.
|
| |
7
|
|
| |
8
|
H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002.
|
 |
9
|
AnHai Doan , Pedro Domingos , Alon Y. Halevy, Reconciling schemas of disparate data sources: a machine-learning approach, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.509-520, May 21-24, 2001, Santa Barbara, California, United States
|
| |
10
|
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.
|
| |
11
|
D. Embley, D. Jackman, and L. Xu. Multifaceted exploitation of metadata for attribute match discovery in information integration. In Proc. of the WIIW-01, 2001.
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
J. Madhavan, P. Bernstein, K. Chen, A. Halevy, and P. Shenoy. Matching schemas by learning from a schema corpus. In Proc. of the IJCAI-03 Workshop on Info. Integration, 2003.
|
| |
17
|
|
| |
18
|
|
| |
19
|
S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. In Proc. of ICDE-2002.
|
 |
20
|
|
| |
21
|
|
| |
22
|
P. Mitra, G. Wiederhold, and J. Jannink. Semi-automatic integration of knowledge sources. In Proc. of Fusion-1999.
|
| |
23
|
M. Perkowitz and O. Etzioni. Category translation: Learning to understand information on the internet. In Proc. of Int. Conf. on AI (IJCAI), 1995.
|
| |
24
|
|
| |
25
|
|
| |
26
|
L. Seligman, A. Rosenthal, P. Lehner, and A. Smith. Data integration: Where does the time go? IEEE Data Engineering Bulletin, 2002.
|
| |
27
|
|
| |
28
|
L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.
|
 |
29
|
Ling Ling Yan , Renée J. Miller , Laura M. Haas , Ronald Fagin, Data-driven understanding and refinement of schema mappings, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.485-496, May 21-24, 2001, Santa Barbara, California, United States
|
CITED BY 51
|
|
|
|
|
|
|
|
|
|
|
Robert McCann , Bedoor AlShebli , Quoc Le , Hoa Nguyen , Long Vu , AnHai Doan, Mapping maintenance for data integration systems, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jie Tang , Juanzi Li , Bangyong Liang , Xiaotong Huang , Yi Li , Kehong Wang, Using Bayesian decision for ontology mapping, Web Semantics: Science, Services and Agents on the World Wide Web, v.4 n.4, p.243-262, December, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tiffany Dohzen , Mujde Pamuk , Seok-Won Seong , Joachim Hammer , Michael Stonebraker, Data integration through transform reuse in the Morpheus project, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seema Degwekar , Jeff DePree , Howard Beck , Carla S. Thomas , Stanley Y. W. Su, Event-triggered data and knowledge sharing among collaborating government organizations, Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains, May 20-23, 2007, Philadelphia, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. Bonifati , G. Mecca , A. Pappalardo , S. Raunich , G. Summa, Schema mapping verification: the spicy way, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|