|
ABSTRACT
Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et al. challenged the randomization schemes, and they pointed out that randomization might not be able to preserve privacy. However, it is still unclear what factors cause such a security breach, how they affect the privacy preserving property of the randomization, and what kinds of data have higher risk of disclosing their private contents even though they are randomized.We believe that the key factor is the correlations among attributes. We propose two data reconstruction methods that are based on data correlations. One method uses the Principal Component Analysis (PCA) technique, and the other method uses the Bayes Estimate (BE) technique. We have conducted theoretical and experimental analysis on the relationship between data correlations and the amount of private information that can be disclosed based our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed.To improve privacy, we propose a modified randomization scheme, in which we let the correlation of random noises "similar" to the original data. Our results have shown that the reconstruction accuracy of both PCA-based and BE-based schemes become worse as the similarity increases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
|
| |
4
|
R. Bronson. Linear Algebra, An Introduction. Academic Press, 1991.
|
 |
5
|
|
| |
6
|
W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the 4th SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 2004.
|
 |
7
|
|
 |
8
|
|
 |
9
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
W. Hardle and L. Simar. Applied Multivariate Statistical Analysis. Springer-Verlag, 2003.
|
| |
14
|
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, 1986.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
S. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
|
 |
22
|
Ashish P. Sanil , Alan F. Karr , Xiaodong Lin , Jerome P. Reiter, Privacy preserving regression modelling via distributed computation, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014139]
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. The American Statistical Association, 60(309):63--69, March 1965.
|
 |
27
|
|
CITED BY 27
|
|
|
|
|
|
|
|
Sheng Zhang , James Ford , Fillia Makedon, A privacy-preserving collaborative filtering scheme with two-way communication, Proceedings of the 7th ACM conference on Electronic commerce, p.316-323, June 11-15, 2006, Ann Arbor, Michigan, USA
|
|
|
|
|
|
Tarek Abdelzaher , Yaw Anokwa , Peter Boda , Jeff Burke , Deborah Estrin , Leonidas Guibas , Aman Kansal , Samuel Madden , Jim Reich, Mobiscopes for Human Spaces, IEEE Pervasive Computing, v.6 n.2, p.20-29, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zekeriya Erkin , Alessandro Piva , Stefan Katzenbeisser , R. L. Lagendijk , Jamshid Shokrollahi , Gregory Neven , Mauro Barni, Protection and retrieval of encrypted multimedia content: when cryptography meets signal processing, EURASIP Journal on Information Security, v.7 n.2, p.1-20, January 2007
|
|
|
|
|
|
|
|
|
Xiaoyun He , Basit Shafiq , Jaideep Vaidya , Nabil Adam, Privacy-preserving link discovery, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Raghu K. Ganti , Nam Pham , Yu-En Tsai , Tarek F. Abdelzaher, PoolView: stream privacy for grassroots participatory sensing, Proceedings of the 6th ACM conference on Embedded network sensor systems, November 05-07, 2008, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|