|
ABSTRACT
Database integration aims at providing a uniform and consistent view called global schema, over a set of autonomous and heterogeneous data sources, so that data residing in different sources can be accessed as if it was in a single schema. The integration of data sources can be performed in two steps, a matching and a data transformation step. Schema matching, the focus of this paper, is a fundamental operation in the manipulation of schema in formatting match, which takes two schemas that correspond semantically to each other. Manually specifying schema matches is a tedious, time consuming, error-prone, and therefore expensive process, which is a growing problem given the rapidly increasing number of data sources to integrate. As systems become able to handle more complex databases and applications such as biomedical databases schemas, their schemas become large, further increasing the number of matches to be performed. Several solutions in solving the issues of schema matching have been proposed. However, these solutions are still limited as (i) they do not explore most of the available information related to schemas, (ii) the approaches rely strictly on the assumption that the schemas to be matched are from the same application domain, and (iii) the approaches either match schemas by comparing the strings of the elements' names or by checking if those names are synonyms. This paper addresses the above limitations by proposing a model for matching heterogeneous relational biomedical databases' schemas that further improves the results of the integration.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bernstein, P. A., S. Melnik, and M. Petropoulos, C., Quix: Industrial-strength Schema Matching. ACM SIGMOD Record, 33(4), 2004, pp. 38--43.
|
| |
2
|
Bilke, A. and Naumann, F., Schema Matching using Duplicates, Proceedings of the Twenty-first International Conference on Data Engineering, 2005, pp. 69--80.
|
| |
3
|
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., and Domingos, P., iMAP: Discovering Complex Semantic Matches between Database Schemas, Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 383--394.
|
| |
4
|
Doan, A., Domingos, P., and Halevy, A., Learning to Match the Schemas of Data Sources: A Multistrategy Approach, Machine Learning, 2003, pp. 279--301.
|
| |
5
|
Doan, A. and Halevy, A., Semantic-Integration Research in the Database Community, AI Magazine, Spring 2005, pp. 83--94.
|
| |
6
|
Fabien Duchateau, Zohra Bellahsene, Mark Roantree, and Mathieu Roche, An Indexing Structure for Automatic Schema Matching, ICDE Workshops, 2007, pp. 485--491.
|
| |
7
|
Jeff Rose and Antonio Carzaniga, Plasma: a Graph based Distributed Computing Model, Workshop at SIGCOMM, 2008, pp. 1--39.
|
| |
8
|
Li, W. and Clifton, C., Semint: A Tool for Identifying Attribute Correspondence in Heterogeneous Databases using Neural Networks, Data and Knowledge Engineering, 2000, 33(1), pp. 49--84.
|
| |
9
|
Lu, J., J. Wang, and S. Wang, An Experiment on the Matching and Reuse of XML Schemas, Proceedings of the International Conference on Web Engineering (ICWE), LNCS 3579, 2005, pp. 273--284.
|
| |
10
|
Madhavan, J., Bernstein, P., Doan, A., and Halevy, A., Corpus-based Schema Matching, Proceedings of the Twenty-first International Conference on Data Engineering, 2005, pp. 75--68.
|
| |
11
|
Madhavan, J., Bernstein, P., and Rahm, E., Generic Schema Matching with Cupid, Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp. 49--58.
|
| |
12
|
Markowitz, V. M. and O. Ritter, Characterizing Heterogeneous Molecular Biology Database Systems, Journal of Computational Biology 2, 1995, pp. 547--556.
|
| |
13
|
Melnik, S., Garcia-Molina, H., and Rahm, E., Similarity Flooding: A Versatile Graph Matching Algorithm, Proceedings of the Eighteenth International Conference on Data Engineering, 2002, pp. 117.
|
| |
14
|
Milo T, and Zohar S. Using Schema Matching to Simplify Heterogeneous Data Translation. Proceedings of the 24th International Conference on Very Large Data Bases, 1998, pp. 122--133.
|
| |
15
|
Mitra P., Wiederhold G., and Jannink J., Semiautomatic Integration of Knowledge Sources, Proceedings of Fusion '99, 1999, pp. 291--331.
|
| |
16
|
Narayanan PS, O'connor MJ, and Das AK, Ontology-driven Mapping of Temporal Data in Biomedical Databases, Proceedings of the AMIA Annual Symposium, 2006, pp. 1045.
|
| |
17
|
Palopoli L., Sacca D., Terracina G., and Ursino D., A Unified Graph-based Framework for Deriving Nominal Interscheme Properties, Type Conflicts and Object Cluster Similarities, Proceedings of the 4th. IFCIS International Conference on Cooperative Information Systems (CoopIS), 1999, pp. 34--45.
|
| |
18
|
Palopoli L., Sacca D., and Ursino D., Semi-automatic, Semantic Discovery of Properties from Database Schemas, Proceedings of the International Database Engineering and Applications Symposium (IDEAS), 1998, pp. 244--253.
|
| |
19
|
Qian Ying, Yue Liwen, and Liu Zhenglin, Discovering Complex Matches between Database Schemas, Control Conference, 2008, pp. 663--667.
|
| |
20
|
Rahm, E. and Bernstein, P., A Survey of Approaches to Automatic Schema Matching, The VLDB Journal, 2001, pp. 335--350.
|
| |
21
|
Ram, S. and Park, J., Semantic Conflict Resolution Ontology (SCROL): An Ontology for Detecting and Resolving Data and Schema-Level Semantic Conflicts, IEEE Transactions on Knowledge and Data Engineering, 16(2), 2004, pp. 189--202.
|
| |
22
|
Sujansky, W., Heterogeneous Database Integration in Biomedicine, Journal of Biomedical Informatics, 2001, pp. 285--298.
|
| |
23
|
Xu, L. and Embley, D., Discovering Direct and Indirect Matches for Schema Elements, Proceedings of the Eight International Conference on Database Systems for Advanced Applications, 2003, pp. 39--46.
|
| |
24
|
Yi-Ping Phoebe Chen, Supawan Prompramote, and Frédéric Maire: MDSM: Microarray Database Schema Matching using the Hungarian Method, Information Science, 176(19), 2006, pp. 2771--2790.
|
| |
25
|
|
|