ACM Home Page
Please provide us with feedback. Feedback
Transforming data to satisfy privacy constraints
Full text PdfPdf (942 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Intrusion and privacy table of contents
Pages: 279 - 288  
Year of Publication: 2002
ISBN:1-58113-567-X
Author
Vijay S. Iyengar  Thomas J. Watson Research Center, Yorktown Heights, NY
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 110,   Citation Count: 63
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775089
What is a DOI?

ABSTRACT

Data on individuals and entities are being collected widely. These data can contain information that explicitly identifies the individual (e.g., social security number). Data can also contain other kinds of personal information (e.g., date of birth, zip code, gender) that are potentially identifying when linked with other available data sets. Data are often shared for business or legal reasons. This paper addresses the important issue of preserving the anonymity of the individuals or entities during the data dissemination process. We explore preserving the anonymity by the use of generalizations and suppressions on the potentially identifying portions of the data. We extend earlier works in this area along various dimensions. First, satisfying privacy constraints is considered in conjunction with the usage for the data being disseminated. This allows us to optimize the process of preserving privacy for the specified usage. In particular, we investigate the privacy transformation in the context of data mining applications like building classification and regression models. Second, our work improves on previous approaches by allowing more flexible generalizations for the data. Lastly, this is combined with a more thorough exploration of the solution space using the genetic algorithm framework. These extensions allow us to transform the data so that they are more useful for their intended purpose while satisfying the privacy constraints.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Science, URL=http://www.ics.uci.edu/~mlearn/MLRespository.html, 1998.
 
3
G. Chen and S. Keller-McNulty. Estimation of identification risk in microdata. Journal of Official Statistics, 14(1):79--95, 1998.
 
4
J. Domingo-Ferrer, J. Mateo-Sanz, and V. Torra. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In Proceedings of NTTS and ETK, 2001.
 
5
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of Twelfth International Conference on Machine Learning, 1995.
 
6
G. Duncan and D. Lambert. Disclosure-limited data dissemination. Journal of the American Statistical Association, 81(393):10--28, 1986.
 
7
 
8
 
9
 
10
A. Hundepool and L. Willenborg. μ- and τ- argus: Software for statistical disclosure control. In Proceedings of Third Internation Seminar on Statistical Confidentiality, 1996.
 
11
J. Kim and W. Winkler. Masking microdata files. In ASA Proceedings of the Section on Survey Research Methods, pages 114--119, 1995.
 
12
D. Lambert. Measures of disclosure risk and harm. Journal off Official Statistics, 9(2):313--331, 1993.
 
13
 
14
 
15
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report Technical Report, SRI International, March 1998.
 
16
C. Skinner. On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica, 46(1):21--32, 1992.
 
17
 
18
 
19
L. Willenborg and T. D. Waal. Statistical Disclosure Control in Practice. Springer-Verlag, 1996.
 
20
L. Willenborg and T. D. Waal. Elements of Statistical Disclosure Control. Springer-Verlag, 2000.
 
21
W. Yancey, W. Winkler, and R. Creecy. Disclosure risk assessment in perturbative microdata protection. Technical Report Research Report Statistics 2002--01, Statistical Research Division, U.S. Bureau of the Census, 2002.

CITED BY  64