|
ABSTRACT
Privacy models such as k-anonymity and l-diversity typically offer an aggregate or scalar notion of the privacy property that holds collectively on the entire anonymized data set. However, they fail to give an accurate measure of privacy with respect to the individual tuples. For example, two anonymizations achieving the same value of k in the k-anonymity model will be considered equally good with respect to privacy protection. However, it is quite possible that for one of the anonymizations a majority of the individual tuples have lesser probabilities of privacy breaches than their counterparts in the other anonymization. We therefore reject the notion that all anonymizations satisfying a particular privacy property, such as k-anonymity, are equally good. The scalar or aggregate value used in privacy models is often biased towards a fraction of the data set, resulting in higher privacy for some individuals and minimalistic for others. Consequently, to better compare anonymization algorithms, there is a need to formalize and measure this bias. Towards this end, we advocate the use of vector-based methods for representing privacy and other measurable properties of an anonymization. We represent the measure of a given property for an anonymized data set using a property vector. Anonymizations are then compared using quality index functions that quantify the effectiveness of the property vectors. A formal analysis with respect to their scope and limitations is provided. Finally, we present preference based techniques when comparisons are to be made across multiple properties induced by anonymizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Dewri, R., Ray, I., Ray, I., and Whitley, D. On the Optimal Selection of k in the k-Anonymity Problem. In Proceedings of the 24th International Conference on Data Engineering (Cancun, Mexico, 2008), pp. 1364--1366.
|
| |
3
|
|
| |
4
|
Hanse, M. P., and Jaszkiewicz, A. Evaluating the Quality of Approximations of the Non-dominated Set. IMM Technical Report IMM-REP-1998-7, Institute of Mathematical Modeling, Technical University of Denmark, 1998.
|
| |
5
|
Huang, Z., and Du, W. OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining. In Proceedings of the 24th International Conference on Data Engineering (Cancun, Mexico, 2008), pp. 705--714.
|
| |
6
|
Hundepool, A., and Willenborg, L. Mu and Tau Argus: Software for Statistical Disclosure Control. In Proceedings of the Third International Seminar on Statistical Confidentiality (Bled, Slovenia, 1996).
|
 |
7
|
|
| |
8
|
Knowles, J. D., and Corne, D. W. On Metrics for Comparing Non-Dominated Sets. In Proceedings of the Congress on Evolutionary Computation (Honolulu, HI, USA, 2002), pp. 711--716.
|
| |
9
|
|
| |
10
|
Li, N., Li, T., and Venkatasubramanian, S. t--Closeness: Privacy Beyond k--Anonymity and l-Diversity. In Proceedings of the 23rd International Conference on Data Engineering (Istanbul, Turkey, 2007), pp. 106--115.
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Takemura, A. Local Recoding by Maximum Weight Matching for Disclosure Control of Microdata Sets. CIRJE F-Series CIRJE-F-40, CIRJE, Faculty of Economics, University of Tokyo, 1999.
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
Jian Xu , Wei Wang , Jian Pei , Xiaoyuan Wang , Baile Shi , Ada Wai-Chee Fu, Utility-based anonymization using local recoding, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150504]
|
| |
23
|
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., and da Fonseca, V. G. Performance Assessment of Multiobjective Optimizers: An Analysis and Review. IEEE Transactions on Evolutionary Computation 7, 2 (2003), 117--132.
|
|