|
ABSTRACT
A schema mapping is a specification that describes how data structured under one schema (the source schema) is to be transformed into data structured under a different schema (the target schema). Schema mappings play a key role in numerous areas of database systems, including database design, information integration, and model management. A fundamental problem in this context is composing schema mappings: given two successive schema mappings, derive a schema mapping between the source schema of the first and the target schema of the second that has the same effect as applying successively the two schema mappings.In this paper, we give a rigorous semantics to the composition of schema mappings and investigate the definability and computational complexity of the composition of two schema mappings. We first study the important case of schema mappings in which the specification is given by a finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that the composition of a finite set of full source-to-target tgds with a finite set of tgds is always definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target tgds with a finite set of full source-to-target tgds may not be definable by any set (finite or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce a class of existential second-order formulas with function symbols, which we call second-order tgds, and make a case that they are the "right" language for composing schema mappings. To this effect, we show that the composition of finite sets of source-to-target tgds is always definable by a second-order tgd. Moreover, the composition of second-order tgds is also definable by a second-order tgd. Our second-order tgds allow equalities, even though the "obvious" way to define them does not require equalities. Allowing equalities in second-order tgds turns out to be of the essence, because we show that second-order tgds without equalities are not sufficiently expressive to define even the composition of finite sets of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data exchange. In particular. the chase procedure can be extended to second-order tgds so that it produces polynomial-time computable universal solutions in data exchange settings specified by second-order tgds.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
C. Beeri and M. Y. Vardi. Formal Systems for Tuple and Equality Generating Dependencies. SIAM J. on Computing, 13(1):76--98, 1984.
|
| |
5
|
P. A. Bernstein. Applying Model Management to Classical Meta-Data Problems. In Conference on Innovative Data Systems Research (CIDR), pages 209--220, 2003.
|
| |
6
|
A. K. Chandra and D. Harel. Structure and Complexity of Relational Queries. Journal of Computer and System Sciences, 25(1):99--128, 1982.
|
| |
7
|
|
| |
8
|
R. Fagin. Generalized First-Order Spectra and Polynomial-Time Recognizable Sets. In R. M. Karp, editor, Complexity of Computation, SIAM-AMS Proceedings, Vol. 7, pages 43--73, 1974.
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
N. Immerman. Descriptive Complexity. Springer, 1999.
|
 |
15
|
|
| |
16
|
|
| |
17
|
J. Madhavan and A. Y. Halevy. Composing Mappings Among Data Sources. In International Conference on Very Large Data Bases (VLDB), pages 572--583, 2003.
|
| |
18
|
|
| |
19
|
L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In International Conference on Very Large Data Bases (VLDB), pages 598--609, 2002.
|
| |
20
|
|
CITED BY 31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leopoldo Bertossi , Jan Chomicki , Parke Godfrey , Phokion G. Kolaitis , Alex Thomo , Calisto Zuzarte, Exchange, integration, and consistency of data: report on the ARISE/NISR workshop, ACM SIGMOD Record, v.34 n.3, September 2005
|
|
|
Angela Bonifati , Elaine Qing Chang , Aks V. S. Lakshmanan , Terence Ho , Rachel Pottinger, HePToX: marrying XML and heterogeneity in your P2P databases, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
Laura M. Haas , Mauricio A. Hernández , Howard Ho , Lucian Popa , Mary Roth, Clio grows up: from research prototype to industrial tool, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Roth , M. A. Hernandez , P. Coulthard , L. Yan , L. Popa , H. C.-T. Ho , C. C. Salter, XML Mapping technology: making connections in an XML-centric world, IBM Systems Journal, v.45 n.2, p.389-409, January 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ariel Fuxman , Mauricio A. Hernandez , Howard Ho , Renee J. Miller , Paolo Papotti , Lucian Popa, Nested mappings: schema mapping reloaded, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|