ACM Home Page
Please provide us with feedback. Feedback
To do or not to do: the dilemma of disclosing anonymized data
Full text PdfPdf (585 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: anonymity and nondisclosure table of contents
Pages: 61 - 72  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Laks V. S. Lakshmanan  University of British Columbia
Raymond T. Ng  University of British Columbia
Ganesh Ramesh  University of British Columbia
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 93,   Citation Count: 4
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066165
What is a DOI?

ABSTRACT

Decision makers of companies often face the dilemma of whether to release data for knowledge discovery, vis a vis the risk of disclosing proprietary or sensitive information. While there are various "sanitization" methods, in this paper we focus on anonymization, given its widespread use in practice. We give due diligence to the question of "just how safe the anonymized data is", in terms of protecting the true identities of the data objects. We consider both the scenarios when the hacker has no information, and more realistically, when the hacker may have partial information about items in the domain. We conduct our analyses in the context of frequent set mining. We propose to capture the prior knowledge of the hacker by means of a belief function, where an educated guess of the frequency of each item is assumed. For various classes of belief functions, which correspond to different degrees of prior knowledge, we derive formulas for computing the expected number of "cracks". While obtaining the exact values for the more general situations is computationally hard, we propose a heuristic called the O-estimate. It is easy to compute, and is shown to be accurate empirically with real benchmark datasets. Finally, based on the O-estimates, we propose a recipe for the decision makers to resolve their dilemma.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Aggarwal C. C and Yu P. S. A Condensation Approach to Privacy Preserving Data Mining. EDBT, 2004.
 
3
Aggarwal. G et. al. Anonymizing Tables. ICDT Conference, 2005.
4
5
6
 
7
8
 
9
 
10
 
11
Fienberg S. E. et. al. Disclosure Limitation using perturbation and related methods for Categorical Data. Journal of Office Statistics, 14, 1998.
12
13
 
14
M. Jerrum and U. Vazirani. A mildly exponential approximation algorithm for the permanent. Algorithmica 16, 1996.
 
15
 
16
 
17
Moore Jr. R. A. Controlled Data-Swapping Techniques for Masking Public Use Microdata Sets. Statistical Research Division Report Series, RR 96-04, US Bureau of Census, Washington D. C., 1996.
18
19
20
 
21
Rasmussen L. E. Approximating the permanent: A simple approach. Random Structures and Algorithms, 5, 1994.
 
22
Samarati P. and Sweeney L. Protecting Privacy when Disclosing Information: k-anonymity and its Enforcement through Generalization and Suppression. IEEE Symposium on Research in Security and Privacy, 1998.
 
23
24
 
25
Valiant L. G. The Complexity of Computing the Permanent. Theoretical Computer Science, 8, 1979.
 
26
 
27
Yang X. and Li C. Secure XML Publishing without Information Leakage in the Presence of Data Inference. VLDB Conference, 2004.

Collaborative Colleagues:
Laks V. S. Lakshmanan: colleagues
Raymond T. Ng: colleagues
Ganesh Ramesh: colleagues