ACM Home Page
Please provide us with feedback. Feedback
A declarative approach to optimize bulk loading into databases
Full text PdfPdf (1.00 MB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 29 ,  Issue 2  (June 2004) table of contents
Pages: 233 - 281  
Year of Publication: 2004
ISSN:0362-5915
Authors
Sihem Amer-Yahia  AT&T Labs--Research, USA
Sophie Cluet  INRIA, France
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 98,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1005566.1005567
What is a DOI?

ABSTRACT

Applications, such as warehouse maintenance, need to load large data volumes regularly. The efficiency of loading depends on the resources that are available at the source and at the target systems. Our work aims to understand the performance criteria that are involved in bulk loading data into a database and to devise tailored optimization strategies.Unlike commercial systems and previous research on the same topic, our approach follows the fundamental database principle of physical-logical independence. A loading program is represented as a sequence of algebraic expressions. This abstraction enables the use of appropriate algebraic rewritings to optimize a loading program and of a cost model that takes into consideration efficiency criteria such as the processing times at the source and target systems and the bandwidth between them. A slow-loading program may be preferable if it does not slow down other applications by consuming too much memory. Thus, we view the problem of optimizing a loading program as finding a compromise between several efficiency criteria.The ability to represent loading programs in an algebra and performance criteria in a cost model has two very desirable properties: reusability and efficiency. Database programmers do not have to write loading programs by hand. In addition, tuning loading programs becomes easier since programmers have a better control on the performance criteria specified in the cost model. The algebra captures data transformations that would have been otherwise hardcoded in loading programs. Consequently, richer optimizations can be explored. Finally, our optimization techniques are not specific to one particular system. They can be used for loading data and from to any structured store (e.g., relational, structured files).We implemented our ideas in a complete environment for migrating ODBC-compliant databases into the O2 object-oriented database system. This prototype provides a declarative view language to specify loading, an interface to specify directives, such as desired database physical organization and constraints on several criteria, such as resource and bandwidth consumption, an algebraic optimizer, a code generator, and an execution environment to control failures and guarantee incremental loading. Our experiments show that a tailored optimization is necessary when loading large data volumes into a database.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Albert, J., Ahmed, R., Ketabchi, M., Hewt, W., Shan, M.-C. 1993. Automatic importation of relational schemas in Pegasus. In Proceedings of the International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase System.
 
3
Amer-Yahia, S. 1997. From relations to objects: The RelOO prototype. In Proceedings of the ICDE Conference---Industrial session. Birmingham, England.
 
4
Amer-Yahia, S. 1999. The RelOO System Web Page. http://www.research.att.com/_˜sihem/relooweb/index.html.
 
5
Amer-Yahia, S., Bréche, P., and dos Santos, C. S. 1997. Object views and updates. Ing. Syst. d'Inf. 5, 1 (Apr.), 63--89.
 
6
 
7
 
8
 
9
 
10
 
11
 
12
13
 
14
15
 
16
 
17
 
18
Delobel, C., dos Santos, C. S., and Tallot, D. 1995. Object views of relations. In Proceedings of the 2nd International Conference on Applications of Databases---ADB '95 (San José, Calif.).
 
19
Exertier, F. 1997. ROBIN: Generating object-oriented and WEB interfaces for relational databases. Tech. rep., Bull. January. White paper.
20
 
21
Fishman, D. and al. 1987. IRIS: An object-oriented database management system. In ACM Trans. Inf. Syst. 5.
22
 
23
Garcia-Molina, H., Labio, W. J., Wiener, J. L., and Zhuge, Y. 1998. Distributed and parallel computing issues in data warehousing. Tech. rep., Stanford Univ., Stanford, Calif., http://www-db.stanford.edu/warehousing/warehouse.html.
 
24
Gemstone. http://www.gemstone.com. Gemstone.
25
 
26
27
 
28
Microsoft. 1994. Open DataBase Connectivity Software Development Kit Manual---Version 2.0. Microsoft. http://www.ddodbc.com.
 
29
 
30
O2Technology. 1996. O2DBaccess User Manual Release 4.6. O2Technology.
 
31
Objectstore. http://www.odi.com. Objectstore.
 
32
Ontos, Inc. 1996. Ontos Object Integration Server. Ontos Inc. http://www.ontos.com.
 
33
Oracle. http://www.oracle.com. Oracle.
 
34
Oracle. 1996. Oracle 7 Server Utilities. Oracle. Release 7.3.
 
35
Packard, H. 1991. OpenODB: Facilitating Change. Tech. rep., Hewlett Packard. September.
 
36
 
37
Pearson, P. 1991. The Genome Data Base (GDB), A Human Genome Mapping Repository. Nucleic Acids Research.
 
38
Persistence Software, Inc. 1993. Bridging the Gap between Relational Data and Object Oriented Development. Tech. rep., Sept.
 
39
Roche, X. and Philippot, Y. 1970. HTTrack. ENSI---Caen---France. http://www.ensicaen.ismra. fr/˜roche/.
40
 
41
 
42
Scholl, M. H., Laasch, C., and Tresch, M. 1991. Updatable Views in Object-Oriented Databases. In Proceedings of the International Conference on Deductive and Object-Oriented Databases (DOOD).
 
43
 
44
 
45
 
46
Wiener, J. L. and Naughton, J. F. 1996. Incremental loading of object databases. Tech. rep. University of Wisconsin.


Collaborative Colleagues:
Sihem Amer-Yahia: colleagues
Sophie Cluet: colleagues