| Estimating and bounding aggregations in databases with referential integrity errors |
| Full text |
Pdf
(402 KB)
|
Source
|
Data Warehousing and OLAP
archive
Proceeding of the ACM 11th international workshop on Data warehousing and OLAP
table of contents
Napa Valley, California, USA
SESSION: Multidimensional design and ETL
table of contents
Pages 49-56
Year of Publication: 2008
ISBN:978-1-60558-250-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 23, Downloads (12 Months): 140, Citation Count: 0
|
|
|
ABSTRACT
Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Marcelo Arenas , Leopoldo Bertossi , Jan Chomicki , Xin He , Vijay Raghavan , Jeremy Spinrad, Scalar aggregation in inconsistent databases, Theoretical Computer Science, v.296 n.3, p.405-434, 14 March 2003
[doi> 10.1016/S0304-3975(02)00737-5]
|
| |
2
|
Doug Burdick , Prasad M. Deshpande , T. S. Jayram , Raghu Ramakrishnan , Shivakumar Vaithyanathan, OLAP over uncertain and imprecise data, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
ISO-ANSI. Database Language SQL-Part2: SQL/Foundation. ANSI, ISO 9075-2 edition, 1999.
|
| |
7
|
|
| |
8
|
|
| |
9
|
Hans-Joachim Lenz , Bernhard Thalheim, OLAP Databases and Aggregation Functions, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.91-100, July 18-20, 2001
|
| |
10
|
|
| |
11
|
C. Ordonez and J. García-García. Consistent aggregations in databases with referential integrity errors. In ACM IQIS, pages 80--89, 2006.
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
TPC. TPC-H Benchmark. Transaction Processing Performance Council, http://www.tpc.org/tpch, 2005.
|
 |
16
|
|
|