|
ABSTRACT
There has been increasing interest in the problem of building accurate data mining models over aggregate data, while protecting privacy at the level of individual records. One approach for this problem is to randomize the values in individual records, and only disclose the randomized values. The model is then built over the randomized data, after first compensating for the randomization (at the aggregate level). This approach is potentially vulnerable to privacy breaches: based on the distribution of the data, one may be able to learn with high confidence that some of the randomized records satisfy a specified property, even though privacy is preserved on average.In this paper, we present a new formulation of privacy breaches, together with a methodology, "amplification", for limiting them. Unlike earlier approaches, amplification makes it is possible to guarantee limits on privacy breaches without any knowledge of the distribution of the original data. We instantiate this methodology for the problem of mining association rules, and modify the algorithm from [9] to limit privacy breaches without knowledge of the data distribution. Next, we address the problem that the amount of randomization required to avoid privacy breaches (when mining association rules) results in very long transactions. By using pseudorandom generators and carefully choosing seeds such that the desired items from the original transaction are present in the randomized transaction, we can send just the seed instead of the transaction, resulting in a dramatic drop in communication and storage cost. Finally, we define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by randomization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
C. Clifton and D. Marks. Security and privacy implications of data mining. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 15--19, May 1996.
|
| |
6
|
The Economist. The End of Privacy, May 1999.
|
| |
7
|
|
| |
8
|
European Union. Directive on Privacy Protection, October 1998.
|
 |
9
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
 |
10
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
| |
11
|
A. S. Hedayat, N. J. A. Sloane, and J. Stufken. Orthogonal Arrays: Theory and Applications. Springer Verlag, August 1999. 440 pp.
|
| |
12
|
M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, June 2002.
|
| |
13
|
|
| |
14
|
F. J. C. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. North-Holland, Amsterdam, 1978. 762 pp.
|
| |
15
|
Office of the Information and Privacy Commissioner, Ontario. Data Mining: Staking a Claim on Your Privacy, January 1998.
|
| |
16
|
|
| |
17
|
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 2002.
|
| |
18
|
S. J. Rizvi and J. R. Haritsa. Privacy-preserving association rule mining. In Proc. of the 28th Int'l Conference on Very Large Databases, August 2002.
|
| |
19
|
C. E. Shannon. Communication theory of secrecy systems. Bell System Technical Journal, 28--4:656--715, 1949.
|
| |
20
|
K. Thearling. Data mining and privacy: A conflict in making.DS*, March 1998.
|
 |
21
|
|
CITED BY 74
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shubha U. Nabar , Bhaskara Marthi , Krishnaram Kenthapadi , Nina Mishra , Rajeev Motwani, Towards robustness in query auditing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
Sheng Zhang , James Ford , Fillia Makedon, A privacy-preserving collaborative filtering scheme with two-way communication, Proceedings of the 7th ACM conference on Electronic commerce, p.316-323, June 11-15, 2006, Ann Arbor, Michigan, USA
|
|
|
|
|
|
|
|
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Boaz Barak , Kamalika Chaudhuri , Cynthia Dwork , Satyen Kale , Frank McSherry , Kunal Talwar, Privacy, accuracy, and consistency too: a holistic solution to contingency table release, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yabo Xu , Ke Wang , Benyu Zhang , Zheng Chen, Privacy-enhancing personalized web search, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Feifei Li , Marios Hadjieleftheriou , George Kollios , Leonid Reyzin, Dynamic authenticated index structures for outsourced databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
Jon M. Kleinberg, Challenges in mining social network data: processes, privacy, and paradoxes, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, p.4-5, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Felipe Saint-Jean , Aaron Johnson , Dan Boneh , Joan Feigenbaum, Private web search, Proceedings of the 2007 ACM workshop on Privacy in electronic society, October 29-29, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vibhor Rastogi , Michael Hay , Gerome Miklau , Dan Suciu, Relationship privacy: output perturbation for queries with joins, Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 29-July 01, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|
|
|
|
|
Raghu K. Ganti , Nam Pham , Yu-En Tsai , Tarek F. Abdelzaher, PoolView: stream privacy for grassroots participatory sensing, Proceedings of the 6th ACM conference on Embedded network sensor systems, November 05-07, 2008, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dan Feldman , Amos Fiat , Haim Kaplan , Kobbi Nissim, Private coresets, Proceedings of the 41st annual ACM symposium on Theory of computing, May 31-June 02, 2009, Bethesda, MD, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|