ACM Home Page
Please provide us with feedback. Feedback
Deriving private information from randomized data
Full text PdfPdf (424 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: anonymity and nondisclosure table of contents
Pages: 37 - 48  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Zhengli Huang  Syracuse University, Syracuse, NY
Wenliang Du  Syracuse University, Syracuse, NY
Biao Chen  Syracuse University, Syracuse, NY
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 156,   Citation Count: 26
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066163
What is a DOI?

ABSTRACT

Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et al. challenged the randomization schemes, and they pointed out that randomization might not be able to preserve privacy. However, it is still unclear what factors cause such a security breach, how they affect the privacy preserving property of the randomization, and what kinds of data have higher risk of disclosing their private contents even though they are randomized.We believe that the key factor is the correlations among attributes. We propose two data reconstruction methods that are based on data correlations. One method uses the Principal Component Analysis (PCA) technique, and the other method uses the Bayes Estimate (BE) technique. We have conducted theoretical and experimental analysis on the relationship between data correlations and the amount of private information that can be disclosed based our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed.To improve privacy, we propose a modified randomization scheme, in which we let the correlation of random noises "similar" to the original data. Our results have shown that the reconstruction accuracy of both PCA-based and BE-based schemes become worse as the similarity increases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
 
4
R. Bronson. Linear Algebra, An Introduction. Academic Press, 1991.
5
 
6
W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the 4th SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 2004.
7
8
9
10
11
 
12
 
13
W. Hardle and L. Simar. Applied Multivariate Statistical Analysis. Springer-Verlag, 2003.
 
14
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, 1986.
15
 
16
 
17
 
18
19
 
20
 
21
S. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
22
23
24
 
25
 
26
S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. The American Statistical Association, 60(309):63--69, March 1965.
27

CITED BY  26

Collaborative Colleagues:
Zhengli Huang: colleagues
Wenliang Du: colleagues
Biao Chen: colleagues