ACM Home Page
Please provide us with feedback. Feedback
On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF
Full text PdfPdf (225 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Chicago, IL, USA
SESSION: Database design table of contents
Pages: 114 - 123  
Year of Publication: 2006
ISBN:1-59593-318-2
Authors
Solmaz Kolahi  University of Toronto
Leonid Libkin  University of Toronto and University of Edinburgh
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 87,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1142351.1142369
What is a DOI?

ABSTRACT

A recently introduced information-theoretic approach to analyzing redundancies in database design was used to justify normal forms like BCNF that completely eliminate redundancies. The main notion is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. In practice, however, one usually settles for 3NF which, unlike BCNF, may not eliminate all redundancies but always guarantees dependency preservation.In this paper we use the information-theoretic approach to prove that 3NF is the best normal form if one needs to achieve dependency preservation. For each dependency-preserving normal form, we define the price of dependency preservation as an information-theoretic measure of redundancy that gets introduced to compensate for dependency preservation. This is a number in the [0,1] range: the smaller it is, the less redundancy a normal form guarantees. We prove that for every dependency-preserving normal form, the price of dependency preservation is at least 1/2, and it is precisely 1/2 for 3NF. Hence, 3NF has the least amount of redundancy among all dependency-preserving normal forms. We also show that, information-theoretically, unnormalized schemas have at least twice the amount of redundancy than schemas in 3NF.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
C. Beeri, P. Bernstein, and N. Goodman. A sophisticate's introduction to database normalization theory. In VLDB'78, pages 113--124.
6
 
7
 
8
 
9
10
 
11
12
13
 
14
 
15
 
16
 
17
S. Kolahi. Dependency-preserving normalization of relational and XML data. In DBPL'05.
 
18
 
19
 
20
 
21
22
 
23
Oracle's General Database Design FAQ. http://www.orafaq.com/faqdesgn.htm.
24
25


Collaborative Colleagues:
Solmaz Kolahi: colleagues
Leonid Libkin: colleagues