|
ABSTRACT
Privacy is an increasingly important aspect of data publishing. Reasoning about privacy, however, is fraught with pitfalls. One of the most significant is the auxiliary information (also called external knowledge, background knowledge, or side information) that an adversary gleans from other channels such as the web, public records, or domain knowledge. This paper explores how one can reason about privacy in the face of rich, realistic sources of auxiliary information. Specifically, we investigate the effectiveness of current anonymization schemes in preserving privacy when multiple organizations independently release anonymized data about overlapping populations. 1. We investigate composition attacks, in which an adversary uses independent anonymized releases to breach privacy. We explain why recently proposed models of limited auxiliary information fail to capture composition attacks. Our experiments demonstrate that even a simple instance of a composition attack can breach privacy in practice, for a large class of currently proposed techniques. The class includes k-anonymity and several recent variants. 2. On a more positive note, certain randomization-based notions of privacy (such as differential privacy) provably resist composition attacks and, in fact, the use of arbitrary side information.This resistance enables "stand-alone" design of anonymization schemes, without the need for explicitly keeping track of other releases. We provide a precise formulation of this property, and prove that an important class of relaxations of differential privacy also satisfy the property. This significantly enlarges the class of protocols known to enable modular design.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Official Statistics, special issue on disclosure limitation methods, Vol. 14, No. 4, 1998.
|
 |
2
|
|
| |
3
|
|
 |
4
|
Boaz Barak , Kamalika Chaudhuri , Cynthia Dwork , Satyen Kale , Frank McSherry , Kunal Talwar, Privacy, accuracy, and consistency too: a holistic solution to contingency table release, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
[doi> 10.1145/1265530.1265569]
|
| |
5
|
M. Barbaro and T. Zeller. A face is exposed for AOL searcher no. 4417749. The New York Times, Aug. 2006.
|
 |
6
|
|
 |
7
|
|
| |
8
|
J.-W. Byun, Y. Sohn, E. Bertino, and N. Li. Secure anonymization for incremental datasets. In Secure Data Management, pages 48--63. Springer, 2006.
|
| |
9
|
K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, pages 198--213. Springer, 2006.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
C. Dwork. Differential privacy. In ICALP, pages 1--12. Springer, 2006.
|
| |
14
|
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503. Springer, 2006.
|
| |
15
|
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486--503. Springer, 2006.
|
| |
16
|
C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, pages 528--544. Springer, 2004.
|
 |
17
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
 |
18
|
|
 |
19
|
|
| |
20
|
S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. CoRR, arXiv:0803.0032v1 {cs.DB}, 2008.
|
| |
21
|
S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270--299, 1984.
|
| |
22
|
S. P. Kasiviswanathan and A. Smith. A note on differential privacy: Defining resistance to arbitrary side information. CoRR, arXiv:0803.39461 {cs.CR}, 2008.
|
| |
23
|
|
 |
24
|
|
| |
25
|
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106--115. IEEE Computer Society, 2007.
|
| |
26
|
|
| |
27
|
A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. Privacy: From theory to practice on the map. In ICDE, pages 277--286. IEEE Computer Society, 2008.
|
 |
28
|
|
| |
29
|
D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, pages 126--135. IEEE Computer Society, 2007.
|
| |
30
|
|
 |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
UCI machine learning repository, http://www.ics.uci.edu/ mlearn/databases/.
|
| |
35
|
A. van den Hout and P. van der Heijden. Randomized response, statistical disclosure control and misclassification: A review. International Statistical Review, 70:269--288, 2002.
|
 |
36
|
|
| |
37
|
S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63--69, 1965.
|
| |
38
|
|
| |
39
|
|
 |
40
|
|
 |
41
|
|
| |
42
|
|
CITED BY 3
|
|
Vibhor Rastogi , Michael Hay , Gerome Miklau , Dan Suciu, Relationship privacy: output perturbation for queries with joins, Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 29-July 01, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|