|
ABSTRACT
Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) l-diversification is solved by techniques developed for the simpler k-anonymization problem, causing unnecessary information loss. (ii) The anonymization process is inefficient in terms of computational and I/O cost. (iii) Previous research focused exclusively on the privacy-constrained problem and ignored the equally important accuracy-constrained (or dual) anonymization problem. In this article, we propose a framework for efficient anonymization of microdata that addresses these deficiencies. First, we focus on one-dimensional (i.e., single-attribute) quasi-identifiers, and study the properties of optimal solutions under the k-anonymity and l-diversity models for the privacy-constrained (i.e., direct) and the accuracy-constrained (i.e., dual) anonymization problems. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multidimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the existing approaches in terms of execution time and information loss.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Gagan Aggarwal , Tomás Feder , Krishnaram Kenthapadi , Samir Khuller , Rina Panigrahy , Dilys Thomas , An Zhu, Achieving anonymity via clustering, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142374]
|
| |
2
|
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2005. Approximation algorithms for k-anonymity. J. Privacy Technol. Article number: 20051120001.
|
| |
3
|
|
| |
4
|
Byun, J.-W., Kamra, A., Bertino, E., and Li, N. 2007. Efficient k -anonymization using clustering techniques. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), 188--200.
|
| |
5
|
Froomkin, A. 2000. The death of privacy. Stanford Law Rev. 52, 5, 1461--1543.
|
| |
6
|
|
 |
7
|
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.205-216, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
Li, N., Li, T., and Venkatasubramanian, S. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of International Conference on Data Engineering (ICDE), 106--115.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Wong, R., Fu, A., Pei, J., Wang, K., Wan, S., and Lo., C. 2006. Multidimensional k-anonymization by linear clustering using space-filling curves. Tech. rep. 2006-27, Simon Fraser University. March.
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
Jian Xu , Wei Wang , Jian Pei , Xiaoyuan Wang , Baile Shi , Ada Wai-Chee Fu, Utility-based anonymization using local recoding, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150504]
|
| |
27
|
Zhang, Q., Koudas, N., Srivastava, D., and Yu, T. 2007. Aggregate query answering on anonymized tables. In Proceedings of International Conference on Data Engineering (ICDE), 116--125.
|
 |
28
|
|
|