|
ABSTRACT
Limiting disclosure in data publishing requires a careful balance between privacy and utility. Information about individuals must not be revealed, but a dataset should still be useful for studying the characteristics of a population. Privacy requirements such as k-anonymity and l-diversity are designed to thwart attacks that attempt to identify individuals in the data and to discover their sensitive information. On the other hand, the utility of such data has not been well-studied.In this paper we will discuss the shortcomings of current heuristic approaches to measuring utility and we will introduce a formal approach to measuring utility. Armed with this utility metric, we will show how to inject additional information into k-anonymous and l-diverse tables. This information has an intuitive semantic meaning, it increases the utility beyond what is possible in the original k-anonymity and l-diversity frameworks, and it maintains the privacy guarantees of k-anonymity and l-diversity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation algorithms for k-anonymity. Journal of Privacy Technology (JOPT), 2005.
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
S. Chawla, C. Dwork, F. McSherry, A. Smith, and H. Wee. Toward privacy in public databases. In Theory of Cryptography Conference, 2005.
|
| |
10
|
Ronald Christensen. Log-Linear Models and Logistic Regression. Springer-Verlag, 1997.
|
| |
11
|
L. H. Cox. Suppression, methodology and statistical disclosure control. Journal of the American Statistical Association, 75, 1980.
|
| |
12
|
T. Dalenius and S. Reiss. Data swapping: A technique for disclosure control. Journal of Statistical Planning and Inference, 6:73--85, 1982.
|
| |
13
|
|
 |
14
|
|
| |
15
|
A. Dobra. Statistical Tools for Disclosure Limitation in Multiway Contingency Tables. PhD thesis, CMU, 2002.
|
 |
16
|
|
 |
17
|
|
| |
18
|
Finn Verner Jensen and Frank Jensen. Optimal junction trees. In UAI, pages 360--366, 1994.
|
 |
19
|
|
| |
20
|
S. L. Lauritzen. Graphical Models. Oxford Science Publications, 1996.
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Francesco M. Malvestuto. Approximating discrete probability distributions with decomposable models. IEEE Transactions on systems, Man and Cybernetics, 21(5):1287--1294, 1991.
|
 |
25
|
|
 |
26
|
|
| |
27
|
Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall, December 2000.
|
| |
28
|
|
| |
29
|
U.C.Irvine Machine Learning Repository. http://www.ics.uci.edu/ mlearn/mlrepository.html.
|
| |
30
|
|
| |
31
|
P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, CMU, SRI, 1998.
|
| |
32
|
Sunita Sarawagi. User-adaptive exploration of multidimensional data. In VLDB, 2000.
|
| |
33
|
|
| |
34
|
|
| |
35
|
Nanny Wermuth. Model search among multiplicative models. Biometrics, 1976.
|
|