ACM Home Page
Please provide us with feedback. Feedback
Privacy, accuracy, and consistency too: a holistic solution to contingency table release
Full text PdfPdf (281 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Beijing, China
SESSION: Privacy, probabilistic databases table of contents
Pages: 273 - 282  
Year of Publication: 2007
ISBN:978-1-59593-685-1
Authors
Boaz Barak  Princeton University
Kamalika Chaudhuri  UC Berkeley
Cynthia Dwork  Microsoft Research
Satyen Kale  Princeton University
Frank McSherry  Microsoft Research
Kunal Talwar  Microsoft Research
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 116,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1265530.1265569
What is a DOI?

ABSTRACT

The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at leas one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously.

Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy-preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy-preserving set of marginals. From these tables alone-and hence without weakening privacy--we will find and output the "nearest" consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself.

The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Special Issue on Statistical Disclosure Control, volume 14(4) of Journal of Official Statistics. 1998.
2
3
4
 
5
M. Bacharach. Matrix rounding problems. Management Science, 9:732--742, 1966.
6
 
7
J. Castro. Quadratic interior-point methods in statistical disclosure control. Computational Management Science, 2:pages 107--121, 2005.
 
8
J. Castro. Minimum-distance controlled perturbation methods for large-scale tabular data protection. Euorpean Journal of Operantional Research, 171:39--52, 2006.
 
9
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In J. Kilian, editor, TCC, volume 3378 of Lecture Notes in Computer Science, pages 363--385. Springer, 2005.
 
10
S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On privacy-preserving histograms. In Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), Arlington, Virginia, 2005. AUAI Press.
 
11
L. Cox, J. Kelly, and R. Patil. Balancing quality and confidentiality in multivariate tabular data. Privacy in Statistical Databases, 3080:87--98, 2004.
 
12
T. Dalenius. Towards a methodology for statistical disclosure control. Statistisk. tidskrift, 3:213--225, 1977.
 
13
R. A. Dandekar and L. Cox. Synthetic tabular data: An alternative to complementary cell suppression, 2002. manuscript, energy Information Administration, US Department of Energy.
14
 
15
A. Dobra and S. Fienberg. Bounding entries in multi-way contingency tables given a set of marginal totals, 2002. Proceedings of Conference on Foundation of Statistical Inference and its Applicaitons.
 
16
 
17
G. Duncan. Confidentiality and statistical disclosure limitation. In N. Smelser and P. Baltes, editors, International Encyclopedia of the Social and Behavioral Sciences. Elsevier, 2001.
 
18
C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1--12. Springer, 2006.
 
19
C. Dwork, D. Lee, and F. McSherry. Privacy preserving histogram case study, 2007. Manuscript.
 
20
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In S. Halevi and T. Rabin, editors, TCC, volume 3876 of Lecture Notes in Computer Science, pages 265--284. Springer, 2006.
21
 
22
C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In M. K. Franklin, editor, CRYPTO, volume 3152 of Lecture Notes in Computer Science, pages 528--544. Springer, 2004.
23
 
24
I. Fellegi. On the question of statistical confidentiality. Journal of the American Statistical Association, pages 7--18, 1972.
 
25
M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization (Algorithms and Combinatorics). Springer, December 1994.
 
26
T. Milo, editor. Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9--12, 2003, San Diego, CA, USA. ACM, 2003.
 
27
J. Kelly, A. Assad, and B. Golden. The controlled rounding problem: Relaxations and complexity issues. OR Spektrum, 12:129--138, 1990.
 
28
T. W. Körner. Fourier Analysis. Cambridge University Press, Cambridge, UK, 1988.
 
29
J. A. D. Loera and S. Onn. All rational polytopes are transportation polytopes and all polytopal integer sets are contingency tables. Proceedings of the 10th Ann. Math. Prog. Soc. Symp. Integ. Prog. Combin. Optim., LNCS, 3064:338--351, 2004.
 
30
 
31
J. A. D. Loera and S. Onn. Markov bases of three-way tables are arbitrarily complicated. J. Symb. Comput., 41(2):173--181, 2006.
 
32
 
33
 
34
D. Rubin. Discussion: Statistical disclosure limitation. Journal of Official Statistics, 9:461--469, 1993.
 
35
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression.
 
36
 
37


Collaborative Colleagues:
Boaz Barak: colleagues
Kamalika Chaudhuri: colleagues
Cynthia Dwork: colleagues
Satyen Kale: colleagues
Frank McSherry: colleagues
Kunal Talwar: colleagues