|
ABSTRACT
Data analysis applications in areas as diverse as remote sensing and telepathology require operating on and processing very large datasets. For such applications to execute efficiently, careful attention must be paid to the storage, retrieval, and manipulation of the datasets. This paper addresses the optimizations performed by a high performance database system that processes groups of data analysis requests for these applications, which we call queries. The system performs end-to-end processing of the requests, formulated as PostgreSQL declarative queries. The queries are converted into imperative descriptions, multiple imperative descriptions are merged into a single execution plan, the plan is optimized to decrease execution time via common compiler optimization techniques, and, finally, the plan is optimized to decrease memory consumption. The last two steps are experimentally shown to effectively reduc the amount of time required while conserving memory space as a group of queries is processed by the database.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
H. Andrade, S. Aryangat, T. Kurc, J. Saltz, and A. Sussman. Efficient execution of multi-query data analysis batches using compiler optimization strategies. In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), College Station, TX, October 2003.
|
 |
2
|
Henrique Andrade , Tahsin Kurc , Alan Sussman , Joel Saltz, Efficient execution of multiple query workloads in data analysis applications, Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), p.53-53, November 10-16, 2001, Denver, Colorado
[doi> 10.1145/582034.582087]
|
| |
3
|
|
| |
4
|
S. Aryangat. Optimizing the execution of data analysis queries. Master's thesis, Department of Computer Science, University of Maryland, December 2003.
|
| |
5
|
Michael Beynon , Chialin Chang , Umit Catalyurek , Tahsin Kurc , Alan Sussman , Henrique Andrade , Renato Ferreira , Joel Saltz, Processing large-scale multi-dimensional data in parallel and distributed environments, Parallel Computing, v.28 n.5, p.827-859, May 2002
[doi> 10.1016/S0167-8191(02)00097-2]
|
| |
6
|
|
| |
7
|
|
| |
8
|
Chialin Chang , Bongki Moon , Anurag Acharya , Carter Shock , Alan Sussman , Joel H. Saltz, Titan: A High-Performance Remote Sensing Database, Proceedings of the Thirteenth International Conference on Data Engineering, p.375-384, April 07-11, 1997
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
Renato Ferreira , Gagan Agrawal , Joel Saltz, Compiling object-oriented data intensive applications, Proceedings of the 14th international conference on Supercomputing, p.11-21, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335233]
|
| |
14
|
|
| |
15
|
High Performance Fortran Forum. High Performance Fortran -- language specification -- version 2.0. Technical report, Rice University, January 1997. Available at http://www.netlib.org/hpf.
|
| |
16
|
S. Kalluri, Z. Zhang, J. JáJá, D. Bader, N. E. Saleous, E. Vermote, and J. R. G. Townshend. A hierarchical data archiving and processing system to generate custom tailored products from AVHRR data. In 1999 IEEE International Geoscience and Remote Sensing Symposium, pages 2374--2376, 1999.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
National Oceanic and Atmospheric Administration. NOAA Polar Orbiter User's Guide -- November 1998 Revision. compiled and edited by Katherine B. Kidwell. Available at http://www2.ncdc.noaa.gov/docs/podug/cover.htm.
|
| |
22
|
PostgreSQL 7.3.2 Developer's Guide. http://www.postgresql.org.
|
 |
23
|
Prasan Roy , S. Seshadri , S. Sudarshan , Siddhesh Bhobe, Efficient and extensible algorithms for multi query optimization, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.249-260, May 15-18, 2000, Dallas, Texas, United States
|
 |
24
|
|
| |
25
|
M. Stonebraker. The SEQUOIA 2000 project. Data Engineering, 16(1):24--28, 1993.
|
| |
26
|
|
|