|
ABSTRACT
Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the individuals or entities during the data dissemination process. We explore preserving the anonymity by the use of generalizations and suppressions on the potentially identifying portions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data being disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and regression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough exploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Science, URL=http://www.ics.uci.edu/~mlearn/MLRespository.html, 1998.
|
| |
3
|
G. Chen and S. Keller-McNulty. Estimation of identification risk in microdata. Journal of Official Statistics, 14(1):79--95, 1998.
|
| |
4
|
J. Domingo-Ferrer, J. Mateo-Sanz, and V. Torra. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In Proceedings of NTTS and ETK, 2001.
|
| |
5
|
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of Twelfth International Conference on Machine Learning, 1995.
|
| |
6
|
G. Duncan and D. Lambert. Disclosure-limited data dissemination. Journal of the American Statistical Association, 81(393):10--28, 1986.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
A. Hundepool and L. Willenborg. μ- and τ- argus: Software for statistical disclosure control. In Proceedings of Third Internation Seminar on Statistical Confidentiality, 1996.
|
| |
11
|
J. Kim and W. Winkler. Masking microdata files. In ASA Proceedings of the Section on Survey Research Methods, pages 114--119, 1995.
|
| |
12
|
D. Lambert. Measures of disclosure risk and harm. Journal off Official Statistics, 9(2):313--331, 1993.
|
| |
13
|
|
| |
14
|
|
| |
15
|
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report Technical Report, SRI International, March 1998.
|
| |
16
|
C. Skinner. On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica, 46(1):21--32, 1992.
|
| |
17
|
|
| |
18
|
|
| |
19
|
L. Willenborg and T. D. Waal. Statistical Disclosure Control in Practice. Springer-Verlag, 1996.
|
| |
20
|
L. Willenborg and T. D. Waal. Elements of Statistical Disclosure Control. Springer-Verlag, 2000.
|
| |
21
|
W. Yancey, W. Winkler, and R. Creecy. Disclosure risk assessment in perturbative microdata protection. Technical Report Research Report Statistics 2002--01, Statistical Research Division, U.S. Bureau of the Census, 2002.
|
CITED BY 64
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Xu , Wei Wang , Jian Pei , Xiaoyuan Wang , Baile Shi , Ada Wai-Chee Fu, Utility-based anonymization using local recoding, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Xu , Wei Wang , Jian Pei , Xiaoyuan Wang , Baile Shi , Ada Wai-Chee Fu, Utility-based anonymization for privacy preservation with less information loss, ACM SIGKDD Explorations Newsletter, v.8 n.2, p.21-30, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sergej Zerr , Elena Demidova , Daniel Olmedilla , Wolfgang Nejdl , Marianne Winslett , Soumyadeb Mitra, Zerber: r-confidential indexing for distributed documents, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
|
|
|
|
|
|
Rinku Dewri , Darrell Whitley , Indrajit Ray , Indrakshi Ray, A multi-objective approach to data sharing with privacy constraints and preference based objectives, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, July 08-12, 2009, Montreal, Québec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K. Selçuk Candan , Huiping Cao , Yan Qi , Maria Luisa Sapino, Table summarization with the help of domain lattices, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Bin Zhou , Yi Han , Jian Pei , Bin Jiang , Yufei Tao , Yan Jia, Continuous privacy preserving publishing of data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|