|
ABSTRACT
Protecting individual privacy is an important problem in microdata distribution and publishing. Anonymization algorithms typically aim to satisfy certain privacy definitions with minimal impact on the quality of the resulting data. While much of the previous literature has measured quality through simple one-size-fits-all measures, we argue that quality is best judged with respect to the workload for which the data will ultimately be used. This article provides a suite of anonymization algorithms that incorporate a target class of workloads, consisting of one or more data mining tasks as well as selection predicates. An extensive empirical evaluation indicates that this approach is often more effective than previous techniques. In addition, we consider the problem of scalability. The article describes two extensions that allow us to scale the anonymization algorithms to datasets much larger than main memory. The first extension is based on ideas from scalable decision trees, and the second is based on sampling. A thorough performance evaluation indicates that these techniques are viable in practice.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Aggarwal, C. and Yu, P. 2004. A condensation approach to privacy-preserving data mining. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT).
|
| |
3
|
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2005. Anonymizing tables. In Proceedings of the 10th International Conference on Database Theory (ICDT).
|
 |
4
|
Gagan Aggarwal , Tomás Feder , Krishnaram Kenthapadi , Samir Khuller , Rina Panigrahy , Dilys Thomas , An Zhu, Achieving anonymity via clustering, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142374]
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. University of California Irvine.
|
 |
9
|
|
| |
10
|
Breiman, L., Freidman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
|
| |
11
|
Chawla, S., Dwork, C., McSherry, F., Smith, A., and Wee, H. 2005. Toward privacy in public databases. In Proceedings of the 2nd Theory of Cryptography Conference.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
Dwork, C. 2006. Differential privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming (ICALP).
|
| |
16
|
Dwork, C., McSherry, F., Nissim, K., and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference.
|
 |
17
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
| |
18
|
|
 |
19
|
Johannes Gehrke , Venkatesh Ganti , Raghu Ramakrishnan , Wei-Yin Loh, BOAT—optimistic decision tree construction, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.169-180, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
20
|
|
| |
21
|
HIP. 2002. Standards for privacy of individuals identifiable health information. U.S. Department of Health and Human Services.
|
| |
22
|
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
Li, N., Li, T., and Venkatasubramanian, S. 2007. t-Closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE International Conference on Data Engineering (ICDE).
|
| |
30
|
|
| |
31
|
Martin, D., Kifer, D., Machanavajjhala, A., Gehrke, J., and Halpern, J. 2007. Worst-case background knowledge in privacy. In Proceedings of the IEEE International Conference on Data Engineering (ICDE).
|
 |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
|
 |
40
|
|
| |
41
|
|
| |
42
|
|
 |
43
|
|
 |
44
|
|
| |
45
|
|
| |
46
|
Zhang, J. and Honavar, V. 2003. Learning decision tree classifiers from attribute value taxonomies and partially specified data. In Proceedings of the 20th International Conference on Machine Learning (ICML).
|
REVIEW
"Aris Gkoulalas-Divanis : Reviewer"
The release of microdata to third parties raises important questions regarding the privacy of the individuals whose information is recorded in the dataset. To meet these privacy concerns, many anonymization algorithms for microdata have been propo
more...
|