ACM Home Page
Please provide us with feedback. Feedback
On the design and quantification of privacy preserving data mining algorithms
Full text PdfPdf (343 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Santa Barbara, California, United States
Pages: 247 - 255  
Year of Publication: 2001
ISBN:1-58113-361-8
Authors
Dakshi Agrawal  IBM T. J. Watson Research Center, Yorktown Heights, NY
Charu C. Aggarwal  IBM T. J. Watson Research Center, Yorktown Heights, NY
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 173,   Citation Count: 110
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/375551.375602
What is a DOI?

ABSTRACT

The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when a large amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy-preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.
 
3
C. Clifton and D. Marks. Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 15-19, May 1996.
 
4
H. Cramer. Mathematical Models of Statistics. Princeton University press, 1946.
5
 
6
7
8
 
9
 
10
The Economist. The end of privacy, May 1999.
 
11
The World Wide Web Consortium. The platform for privacy preference (P3P). Available from http://www.w3.org/P3P/P3FAQ.html.
 
12
K. Thearling. Data mining and privacy: A conflict in making. DS, Mar. 1998.
 
13
Time. The death of privacy, Aug. 1997.
 
14
15
 
16
J. Wu. On the convergence properties of the EM algorithm. Annals of Statistics, 11(1):95-103, 1983.

CITED BY  111

Collaborative Colleagues:
Dakshi Agrawal: colleagues
Charu C. Aggarwal: colleagues