ACM Home Page
Please provide us with feedback. Feedback
Smooth sensitivity and sampling in private data analysis
Full text PdfPdf (305 KB)
Source
Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing table of contents
San Diego, California, USA
SESSION: Session 2B table of contents
Pages: 75 - 84  
Year of Publication: 2007
ISBN:978-1-59593-631-8
Authors
Kobbi Nissim  Ben-Gurion University of the Negev, Beer-Sheva, Israel
Sofya Raskhodnikova  Pennsylvania State University, University Park, PA
Adam Smith  Pennsylvania State University, University Park, PA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 153,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1250790.1250803
What is a DOI?

ABSTRACT

We introduce a new, generic framework for private data analysis.The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains.Our framework allows one to release functions f of the data withinstance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also bythe database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smoothsensitivity of f on the database x --- a measure of variabilityof f in the neighborhood of the instance x. The new frameworkgreatly expands the applicability of output perturbation, a technique for protecting individuals' privacy by adding a smallamount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instance-basednoise in the context of data privacy.

Our framework raises many interesting algorithmic questions. Namely,to apply the framework one must compute or approximate the smoothsensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost ofthe minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on manydatabases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known orwhen f is given as a black box. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In Theory of Cryptography Conference (TCC), pages 363--385, 2005.
 
5
S. Chawla, C. Dwork, F. McSherry, and K. Talwar. On the utility of privacy-preserving histograms. In 21st Conference on Uncertainty in Artificial Intelligence (UAI), 2005.
6
7
 
8
C. Dwork. Differential privacy. In ICALP, pages 1--12, 2006.
 
9
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503, 2006.
 
10
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265--284, 2006.
 
11
C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, pages 528--544, 2004.
12
 
13
J. Gehrke. Models and methods for privacy-preserving data publishing and analysis (tutorial slides). In Twelfth Annual SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2006), 2006.
 
14
 
15
 
16
A. Slavkovic. Statistical Disclosure Limitation Beyond the Margins: Characterization of Joint Distributions for Contingency Tables. Ph.D. Thesis, Department of Statistics, Carnegie Mellon University, 2004.
17
 
18
 
19
 
20
L.N. Wasserstein. Markov processes over denumerable products of spaces describing large systems of automata. Probl. Inform. Transmission, 5:47--52, 1969.

CITED BY  8

Collaborative Colleagues:
Kobbi Nissim: colleagues
Sofya Raskhodnikova: colleagues
Adam Smith: colleagues