|
ABSTRACT
The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when a large amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy-preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts, 1995.
|
| |
3
|
C. Clifton and D. Marks. Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 15-19, May 1996.
|
| |
4
|
H. Cramer. Mathematical Models of Statistics. Princeton University press, 1946.
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
The Economist. The end of privacy, May 1999.
|
| |
11
|
The World Wide Web Consortium. The platform for privacy preference (P3P). Available from http://www.w3.org/P3P/P3FAQ.html.
|
| |
12
|
K. Thearling. Data mining and privacy: A conflict in making. DS, Mar. 1998.
|
| |
13
|
Time. The death of privacy, Aug. 1997.
|
| |
14
|
|
 |
15
|
|
| |
16
|
J. Wu. On the convergence properties of the EM algorithm. Annals of Statistics, 11(1):95-103, 1983.
|
CITED BY 111
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vladimir Estivill-Castro , Chris Clifton, Preface: proceedings of the ICDM 2002 workshop on privacy, security, and data mining, Proceedings of the IEEE international conference on Privacy, security and data mining, p..1, December 01, 2002, Maebashi City, Japan
|
|
|
|
|
|
|
|
|
Boštjan Brumen , Tatjana Welzer , Marjan Družovec , Izidor Golob , Hannu Jaakkola , Ivan Rozman , Jiři Kubalik, Protecting medical data for decision-making analyses, Journal of Medical Systems, v.29 n.1, p.65-80, February 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sheng Zhang , James Ford , Fillia Makedon, A privacy-preserving collaborative filtering scheme with two-way communication, Proceedings of the 7th ACM conference on Electronic commerce, p.316-323, June 11-15, 2006, Ann Arbor, Michigan, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Boaz Barak , Kamalika Chaudhuri , Cynthia Dwork , Satyen Kale , Frank McSherry , Kunal Talwar, Privacy, accuracy, and consistency too: a holistic solution to contingency table release, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
Tarek Abdelzaher , Yaw Anokwa , Peter Boda , Jeff Burke , Deborah Estrin , Leonidas Guibas , Aman Kansal , Samuel Madden , Jim Reich, Mobiscopes for Human Spaces, IEEE Pervasive Computing, v.6 n.2, p.20-29, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jon M. Kleinberg, Challenges in mining social network data: processes, privacy, and paradoxes, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, p.4-5, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baik Hoh , Marco Gruteser , Hui Xiong , Ansaf Alrabady, Preserving privacy in gps traces via uncertainty-aware path cloaking, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haixun Wang , Jian Yin , Chang-shing Perng , Philip S. Yu, Dual encryption for query integrity assurance, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Boštjan Brumen , Izidor Golob , Tatjana Welzer , Marjan Družovec , Ivan Rozman , Hannu Jaakkola, An Algorithm for Protecting Knowledge Discovery Data, Informatica, v.14 n.3, p.277-288, August 2003
|
|
|
|
|
|
|
|
|
Baik Hoh , Marco Gruteser , Ryan Herring , Jeff Ban , Daniel Work , Juan-Carlos Herrera , Alexandre M. Bayen , Murali Annavaram , Quinn Jacobson, Virtual trip lines for distributed privacy-preserving traffic monitoring, Proceeding of the 6th international conference on Mobile systems, applications, and services, June 17-20, 2008, Breckenridge, CO, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Raghu K. Ganti , Nam Pham , Yu-En Tsai , Tarek F. Abdelzaher, PoolView: stream privacy for grassroots participatory sensing, Proceedings of the 6th ACM conference on Embedded network sensor systems, November 05-07, 2008, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Azami Zaharim , Shahrum Abdullah , Mohammad Darahim Ibrahim , Zulkifli Mohd Nopiah, Genetic algorithm in time series fatigue analysis, Proceedings of the 13th WSEAS international conference on Applied mathematics, p.241-245, December 15-17, 2008, Puerto De La Cruz, Spain
|
|
|
|
|
|
|
|
|
|
|