ACM Home Page
Please provide us with feedback. Feedback
Limiting privacy breaches in privacy preserving data mining
Full text PdfPdf (227 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
San Diego, California
Pages: 211 - 222  
Year of Publication: 2003
ISBN:1-58113-670-6
Authors
Alexandre Evfimievski  Cornell University
Johannes Gehrke  Cornell University
Ramakrishnan Srikant  IBM Almaden Research Center
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 221,   Citation Count: 73
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/773153.773174
What is a DOI?

ABSTRACT

There has been increasing interest in the problem of building accurate data mining models over aggregate data, while protecting privacy at the level of individual records. One approach for this problem is to randomize the values in individual records, and only disclose the randomized values. The model is then built over the randomized data, after first compensating for the randomization (at the aggregate level). This approach is potentially vulnerable to privacy breaches: based on the distribution of the data, one may be able to learn with high confidence that some of the randomized records satisfy a specified property, even though privacy is preserved on average.In this paper, we present a new formulation of privacy breaches, together with a methodology, "amplification", for limiting them. Unlike earlier approaches, amplification makes it is possible to guarantee limits on privacy breaches without any knowledge of the distribution of the original data. We instantiate this methodology for the problem of mining association rules, and modify the algorithm from [9] to limit privacy breaches without knowledge of the data distribution. Next, we address the problem that the amount of randomization required to avoid privacy breaches (when mining association rules) results in very long transactions. By using pseudorandom generators and carefully choosing seeds such that the desired items from the original transaction are present in the randomized transaction, we can send just the seed instead of the transaction, resulting in a dramatic drop in communication and storage cost. Finally, we define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by randomization.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
C. Clifton and D. Marks. Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 15--19, May 1996.
 
6
The Economist. The End of Privacy, May 1999.
 
7
 
8
European Union. Directive on Privacy Protection, October 1998.
9
10
 
11
A. S. Hedayat, N. J. A. Sloane, and J. Stufken. Orthogonal Arrays: Theory and Applications. Springer Verlag, August 1999. 440 pp.
 
12
M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, June 2002.
 
13
 
14
F. J. C. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. North-Holland, Amsterdam, 1978. 762 pp.
 
15
Office of the Information and Privacy Commissioner, Ontario. Data Mining: Staking a Claim on Your Privacy, January 1998.
 
16
 
17
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 2002.
 
18
S. J. Rizvi and J. R. Haritsa. Privacy-preserving association rule mining. In Proc. of the 28th Int'l Conference on Very Large Databases, August 2002.
 
19
C. E. Shannon. Communication theory of secrecy systems. Bell System Technical Journal, 28--4:656--715, 1949.
 
20
K. Thearling. Data mining and privacy: A conflict in making.DS*, March 1998.
21

CITED BY  74

Collaborative Colleagues:
Alexandre Evfimievski: colleagues
Johannes Gehrke: colleagues
Ramakrishnan Srikant: colleagues