|
ABSTRACT
In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti's theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization scheme that is vulnerable to this attack. In fact, any scheme that uses the random worlds model, i.i.d. model, or tuple-independent model needs to be re-evaluated. The difference between the attack presented here and others that have been proposedin the past is that we do not need extensive background knowledge. An attacker only needs to know the nonsensitive attributes of one individual in the data, and can carry out this attack just by building a machine learning model over the sanitized data. The reason this attack is successful is that it exploits a subtle flaw in the way prior work computed the probability of disclosure of a sensitive attribute. We demonstrate this theoretically, empirically, and with intuitive examples. We also discuss how this generalizes to many other privacy schemes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Gagan Aggarwal , Tomás Feder , Krishnaram Kenthapadi , Samir Khuller , Rina Panigrahy , Dilys Thomas , An Zhu, Achieving anonymity via clustering, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142374]
|
| |
3
|
A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.
|
| |
4
|
F. Bacchus, A. J. Grove, J. Y. Halpern, and D. Koller. From statistics to beliefs. In AAAI, 1992.
|
 |
5
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242598]
|
| |
6
|
Michael Barbaro and Tom Zeller. A face is exposed for AOL searcher no. 4417749. New York Times, August 9 2006.
|
| |
7
|
|
| |
8
|
Yvonne M. Bishop, Stephen E. Fienberg, and Paul W. Holland. Discrete Multivariate Analysis: Theory and Practice. Springer, 2007.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Nando de Freitas and Hendrik Kück. Learning about individuals from group statistics. In UAI, pages 332--339, 2005.
|
| |
13
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.
|
| |
14
|
Persi Diaconis and David Freedman. De finetti's generalizations of exchangeability. In Richard C. Jeffrey, editor, Studies in Inductive Logic and Probability, Volume II, pages 233--249. University of California Press, 1980.
|
| |
15
|
|
 |
16
|
|
| |
17
|
Cynthia Dwork. Differential privacy. In ICALP, 2006.
|
| |
18
|
|
 |
19
|
|
| |
20
|
Chengfang Fang , Ee-Chien Chang, Information Leakage in Optimal Anonymized and Diversified Data, Information Hiding: 10th International Workshop, IH 2008, Santa Barbara, CA, USA, May 19-21, 2008, Revised Selected Papers, Springer-Verlag, Berlin, Heidelberg, 2008
[doi> 10.1007/978-3-540-88961-8_3]
|
| |
21
|
Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, and Adam Smith. Composition attacks and auxiliary information in data privacy. In KDD, 2008.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
George H. John and Pat Langley. Estimating continuous distributions in bayesian classifiers. In UAI, 1995.
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
 |
29
|
Ravi Kumar , Jasmine Novak , Bo Pang , Andrew Tomkins, On anonymizing query logs via token-based hashing, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242657]
|
 |
30
|
|
 |
31
|
|
| |
32
|
|
| |
33
|
N. Li, T. Li, and S. Venkatasubramanian. t-Closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.
|
| |
34
|
|
 |
35
|
|
| |
36
|
David Martin, Daniel Kifer, Ashwin Machanavajjhala, Johannes Gehrke, and Joseph Halpern. Worst case background knowledge for privacy preserving data publishing. In ICDE, 2007.
|
 |
37
|
|
 |
38
|
|
| |
39
|
Arvind Narayanan and Vitaly Shmatikov. How to break anonymity of the net ix prize dataset. http://arxiv.org/abs/cs/0610105, 2006.
|
| |
40
|
|
 |
41
|
Novi Quadrianto , Alex J. Smola , Tiberio S. Caetano , Quoc V. Le, Estimating labels from label proportions, Proceedings of the 25th international conference on Machine learning, p.776-783, July 05-09, 2008, Helsinki, Finland
[doi> 10.1145/1390156.1390254]
|
| |
42
|
|
| |
43
|
Jason Rennie, Lawrence Shih, Jaime Teevan, and David R. Karger. Tackling the poor assumptions of naive bayes text classifiers. In ICML, 2003.
|
| |
44
|
|
| |
45
|
Mark J. Schervish. Theory of Statistics. Springer, 1995.
|
| |
46
|
|
| |
47
|
L. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 1979.
|
| |
48
|
|
| |
49
|
|
| |
50
|
|
|