ACM Home Page
Please provide us with feedback. Feedback
iMAP: discovering complex semantic matches between database schemas
Full text PdfPdf (356 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: data integration table of contents
Pages: 383 - 394  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Robin Dhamankar  University of Illinois, Urbana-Champaign, IL
Yoonkyong Lee  University of Illinois, Urbana-Champaign, IL
AnHai Doan  University of Illinois, Urbana-Champaign, IL
Alon Halevy  University of Washington, Seattle, WA
Pedro Domingos  University of Washington, Seattle, WA
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 131,   Citation Count: 51
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007612
What is a DOI?

ABSTRACT

Creating semantic matches between disparate data sources is fundamental to numerous data sharing efforts. Manually creating matches is extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these works deal only with one-to-one (1-1) matches, such as address = location. They do not consider the important class of more complex matches, such as address = concat (city, state) and room-pric = room-rate* (1 + tax-rate).We describe the iMAP system which semi-automatically discovers both 1-1 and complex matches. iMAP reformulates schema matching as a search in an often very large or infinite match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. To further improve matching accuracy, iMAP exploits a variety of domain knowledge, including past complex matches, domain integrity constraints, and overlap data. Finally, iMAP introduces a novel feature that generates explanation of predicted matches, to provide insights into the matching process and suggest actions to converge on correct matches quickly. We apply iMAP to several real-world domains to match relational tables, and show that it discovers both 1-1 and complex matches with high accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
C. Clifton, E. Housman, and A. Rosenthal. Experience with a combined approach to attribute-matching across heterogeneous databases. In Proc. of the IFIP Working Conference on Data Semantics (DS-7), 1997.
 
4
5
 
6
R. Dhamankar. Semi-automated discovery of matches between schemas, ontologies, and data fragments of disparate data sources. M. S. Thesis, Dept. of CS, Univ. of Illinois. To appear.
 
7
 
8
H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002.
9
 
10
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.
 
11
D. Embley, D. Jackman, and L. Xu. Multifaceted exploitation of metadata for attribute match discovery in information integration. In Proc. of the WIIW-01, 2001.
12
13
14
 
15
 
16
J. Madhavan, P. Bernstein, K. Chen, A. Halevy, and P. Shenoy. Matching schemas by learning from a schema corpus. In Proc. of the IJCAI-03 Workshop on Info. Integration, 2003.
 
17
 
18
 
19
S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. In Proc. of ICDE-2002.
20
 
21
 
22
P. Mitra, G. Wiederhold, and J. Jannink. Semi-automatic integration of knowledge sources. In Proc. of Fusion-1999.
 
23
M. Perkowitz and O. Etzioni. Category translation: Learning to understand information on the internet. In Proc. of Int. Conf. on AI (IJCAI), 1995.
 
24
 
25
 
26
L. Seligman, A. Rosenthal, P. Lehner, and A. Smith. Data integration: Where does the time go? IEEE Data Engineering Bulletin, 2002.
 
27
 
28
L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.
29

CITED BY  51
Collaborative Colleagues:
Robin Dhamankar: colleagues
Yoonkyong Lee: colleagues
AnHai Doan: colleagues
Alon Halevy: colleagues
Pedro Domingos: colleagues