ACM Home Page
Please provide us with feedback. Feedback
The cost of privacy: destruction of data-mining utility in anonymized data publishing
Full text PdfPdf (581 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 70-78  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Justin Brickell  The University of Texas at Austin, Austin, TX, USA
Vitaly Shmatikov  The University of Texas at Austin, Austin, TX, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 366,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401904
What is a DOI?

ABSTRACT

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi-identifier" tuple appear in at least k records, while l-diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier. In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but k-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records.

For our experimental evaluation, we use the same datasets from the UCI machine learning repository as were used in previous research on generalization and suppression. Our results demonstrate that even modest privacy gains require almost complete destruction of the data-mining utility. In most cases, trivial sanitization provides equivalent utility and better privacy than k-anonymity, l-diversity, and similar methods based on generalization and suppression.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. N. A. Asuncion. UCI machine learning repository, 2007.
2
 
3
4
 
5
6
 
7
J.-W. Byun, Y. Sohn, E. Bertino, and N. Li. Secure anonymization for incremental datasets. In SDM, 2006.
 
8
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Towards privacy in public databases. In TCC, 2005.
 
9
 
10
V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati. k-anonymity. Secure Data Management in Decentralized Systems, 2007.
11
 
12
C. Dwork. Differential privacy. In ICALP, 2006.
13
 
14
15
16
 
17
D. Lambert. Measures of disclosure risk and harm. J. Official Stat., 9, 1993.
18
 
19
20
 
21
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.
 
22
 
23
D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge for privacy-preserving data publishing. In ICDE, 2007.
24
 
25
M. Nergiz and C. Clifton. Thoughts on k-anonymization. In PDM, 2006.
26
 
27
M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. In ICDE, 2007.
 
28
M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. In ICDE, 2007.
29
 
30
 
31
 
32
L. Sweeney. Weaving technology and policy together to maintain confidentiality. J. of Law, Medicine and Ethics, 25(2-3):98--110, 1997.
 
33
 
34
35
 
36
T. Truta and B. Vinay. Privacy protection: p-sensitive k-anonymity property. In PDM, 2006.
37
 
38
39
40
 
41
42
43


Collaborative Colleagues:
Justin Brickell: colleagues
Vitaly Shmatikov: colleagues