ACM Home Page
Please provide us with feedback. Feedback
Estimating and bounding aggregations in databases with referential integrity errors
Full text PdfPdf (402 KB)
Source
Data Warehousing and OLAP archive
Proceeding of the ACM 11th international workshop on Data warehousing and OLAP table of contents
Napa Valley, California, USA
SESSION: Multidimensional design and ETL table of contents
Pages 49-56  
Year of Publication: 2008
ISBN:978-1-60558-250-4
Authors
Javier García-García  Universidad Nacional Autónoma de México, Mexico City, Mexico
Carlos Ordonez  University of Houston, Houston, TX, USA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 140,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458432.1458442
What is a DOI?

ABSTRACT

Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
 
6
ISO-ANSI. Database Language SQL-Part2: SQL/Foundation. ANSI, ISO 9075-2 edition, 1999.
 
7
 
8
 
9
Hans-Joachim Lenz , Bernhard Thalheim, OLAP Databases and Aggregation Functions, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.91-100, July 18-20, 2001
 
10
 
11
C. Ordonez and J. García-García. Consistent aggregations in databases with referential integrity errors. In ACM IQIS, pages 80--89, 2006.
 
12
13
14
 
15
TPC. TPC-H Benchmark. Transaction Processing Performance Council, http://www.tpc.org/tpch, 2005.
16

Collaborative Colleagues:
Javier García-García: colleagues
Carlos Ordonez: colleagues