ACM Home Page
Please provide us with feedback. Feedback
On generating near-optimal tableaux for conditional functional dependencies
Full text PdfPdf (728 KB)
Source
Proceedings of the VLDB Endowment archive
Volume 1 ,  Issue 1  (August 2008) table of contents
SESSION: Theory table of contents
Pages 376-390  
Year of Publication: 2008
ISSN:2150-8097
Authors
Lukasz Golab  AT&T Labs - Research
Howard Karloff  AT&T Labs - Research
Flip Korn  AT&T Labs - Research
Divesh Srivastava  AT&T Labs - Research
Bei Yu  Nat'l University of Singapore
Publisher
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 61,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1453856.1453900
What is a DOI?

ABSTRACT

Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify data inconsistencies. A CFD augments a functional dependency (FD) with a pattern tableau that defines the context (i.e., the subset of tuples) in which the underlying FD holds. While many aspects of CFDs have been studied, including static analysis and detecting and repairing violations, there has not been prior work on generating pattern tableaux, which is critical to realize the full potential of CFDs.

This paper is the first to formally characterize a "good" pattern tableau, based on naturally desirable properties of support, confidence and parsimony. We show that the problem of generating an optimal tableau for a given FD is NP-complete but can be approximated in polynomial time via a greedy algorithm. For large data sets, we propose an "on-demand" algorithm providing the same approximation bound, that outperforms the basic greedy algorithm in running time by an order of magnitude. For ordered attributes, we propose the range tableau as a generalization of a pattern tableau, which can achieve even more parsimony. The effectiveness and efficiency of our techniques are experimentally demonstrated on real data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, 2007.
 
5
 
6
7
8
 
9
10
 
11
 
12
 
13
 
14
Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.
15
 
16
 
17
R. King and J. Legendre. Discovery of functional and approximate functional dependencies in relational databases. Journal of Applied Mathematics and Decision Sciences, 7(1):49--59, 2003.
 
18
19
 
20
21
 
22
Y. Xu and R. Motwani. Random sampling based algorithms for efficient semi-key discovery. Stanford University Technical Report.


Collaborative Colleagues:
Lukasz Golab: colleagues
Howard Karloff: colleagues
Flip Korn: colleagues
Divesh Srivastava: colleagues
Bei Yu: colleagues