|
ABSTRACT
Applications, such as warehouse maintenance, need to load large data volumes regularly. The efficiency of loading depends on the resources that are available at the source and at the target systems. Our work aims to understand the performance criteria that are involved in bulk loading data into a database and to devise tailored optimization strategies.Unlike commercial systems and previous research on the same topic, our approach follows the fundamental database principle of physical-logical independence. A loading program is represented as a sequence of algebraic expressions. This abstraction enables the use of appropriate algebraic rewritings to optimize a loading program and of a cost model that takes into consideration efficiency criteria such as the processing times at the source and target systems and the bandwidth between them. A slow-loading program may be preferable if it does not slow down other applications by consuming too much memory. Thus, we view the problem of optimizing a loading program as finding a compromise between several efficiency criteria.The ability to represent loading programs in an algebra and performance criteria in a cost model has two very desirable properties: reusability and efficiency. Database programmers do not have to write loading programs by hand. In addition, tuning loading programs becomes easier since programmers have a better control on the performance criteria specified in the cost model. The algebra captures data transformations that would have been otherwise hardcoded in loading programs. Consequently, richer optimizations can be explored. Finally, our optimization techniques are not specific to one particular system. They can be used for loading data and from to any structured store (e.g., relational, structured files).We implemented our ideas in a complete environment for migrating ODBC-compliant databases into the O2 object-oriented database system. This prototype provides a declarative view language to specify loading, an interface to specify directives, such as desired database physical organization and constraints on several criteria, such as resource and bandwidth consumption, an algebraic optimizer, a code generator, and an execution environment to control failures and guarantee incremental loading. Our experiments show that a tailored optimization is necessary when loading large data volumes into a database.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Albert, J., Ahmed, R., Ketabchi, M., Hewt, W., Shan, M.-C. 1993. Automatic importation of relational schemas in Pegasus. In Proceedings of the International Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase System.
|
| |
3
|
Amer-Yahia, S. 1997. From relations to objects: The RelOO prototype. In Proceedings of the ICDE Conference---Industrial session. Birmingham, England.
|
| |
4
|
Amer-Yahia, S. 1999. The RelOO System Web Page. http://www.research.att.com/_˜sihem/relooweb/index.html.
|
| |
5
|
Amer-Yahia, S., Bréche, P., and dos Santos, C. S. 1997. Object views and updates. Ing. Syst. d'Inf. 5, 1 (Apr.), 63--89.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
M. J. Carey , L. M. Haas , P. M. Schwarz , M. Arya , W. F. Cody , R. Fagin , M. Flickner , A. W. Luniewski , W. Niblack , D. Petkovic , J. Thomas , J. H. Williams , E. L. Wimmers, Towards heterogeneous multimedia information systems: the Garlic approach, Proceedings of the 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management (RIDE-DOM'95), p.124, March 06-07, 1995
|
 |
13
|
Michael J. Carey , David J. DeWitt , Michael J. Franklin , Nancy E. Hall , Mark L. McAuliffe , Jeffrey F. Naughton , Daniel T. Schuh , Marvin H. Solomon , C. K. Tan , Odysseas G. Tsatalos , Seth J. White , Michael J. Zwilling, Shoring up persistent applications, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.383-394, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
14
|
R. G. G. Cattell , Douglas K. Barry , Dirk Bartels , Mark Berler , Jeff Eastman , Sophie Gamerman , David Jordan , Adam Springer , Henry Strickland , Drew Wade, The object database standard: ODMG 2.0, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1997
|
 |
15
|
Sophie Cluet , Claude Delobel , Jérǒme Siméon , Katarzyna Smaga, Your mediators need data conversion!, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.177-188, June 01-04, 1998, Seattle, Washington, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
Delobel, C., dos Santos, C. S., and Tallot, D. 1995. Object views of relations. In Proceedings of the 2nd International Conference on Applications of Databases---ADB '95 (San José, Calif.).
|
| |
19
|
Exertier, F. 1997. ROBIN: Generating object-oriented and WEB interfaces for relational databases. Tech. rep., Bull. January. White paper.
|
 |
20
|
Mary Fernández , Daniela Florescu , Jaewoo Kang , Alon Levy , Dan Suciu, Catching the boat with Strudel: experiences with a Web-site management system, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.414-425, June 01-04, 1998, Seattle, Washington, United States
|
| |
21
|
Fishman, D. and al. 1987. IRIS: An object-oriented database management system. In ACM Trans. Inf. Syst. 5.
|
 |
22
|
|
| |
23
|
Garcia-Molina, H., Labio, W. J., Wiener, J. L., and Zhuge, Y. 1998. Distributed and parallel computing issues in data warehousing. Tech. rep., Stanford Univ., Stanford, Calif., http://www-db.stanford.edu/warehousing/warehouse.html.
|
| |
24
|
Gemstone. http://www.gemstone.com. Gemstone.
|
 |
25
|
|
| |
26
|
|
 |
27
|
Wilburt Juan Labio , Janet L. Wiener , Hector Garcia-Molina , Vlad Gorelik, Efficient resumption of interrupted warehouse loads, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.46-57, May 15-18, 2000, Dallas, Texas, United States
|
| |
28
|
Microsoft. 1994. Open DataBase Connectivity Software Development Kit Manual---Version 2.0. Microsoft. http://www.ddodbc.com.
|
| |
29
|
|
| |
30
|
O2Technology. 1996. O2DBaccess User Manual Release 4.6. O2Technology.
|
| |
31
|
Objectstore. http://www.odi.com. Objectstore.
|
| |
32
|
Ontos, Inc. 1996. Ontos Object Integration Server. Ontos Inc. http://www.ontos.com.
|
| |
33
|
Oracle. http://www.oracle.com. Oracle.
|
| |
34
|
Oracle. 1996. Oracle 7 Server Utilities. Oracle. Release 7.3.
|
| |
35
|
Packard, H. 1991. OpenODB: Facilitating Change. Tech. rep., Hewlett Packard. September.
|
| |
36
|
|
| |
37
|
Pearson, P. 1991. The Genome Data Base (GDB), A Human Genome Mapping Repository. Nucleic Acids Research.
|
| |
38
|
Persistence Software, Inc. 1993. Bridging the Gap between Relational Data and Object Oriented Development. Tech. rep., Sept.
|
| |
39
|
Roche, X. and Philippot, Y. 1970. HTTrack. ENSI---Caen---France. http://www.ensicaen.ismra. fr/˜roche/.
|
 |
40
|
Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States
|
| |
41
|
|
| |
42
|
Scholl, M. H., Laasch, C., and Tresch, M. 1991. Updatable Views in Object-Oriented Databases. In Proceedings of the International Conference on Deductive and Object-Oriented Databases (DOOD).
|
| |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
Wiener, J. L. and Naughton, J. F. 1996. Incremental loading of object databases. Tech. rep. University of Wisconsin.
|
|