|
ABSTRACT
In this paper we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the k-anonymity guarantee, the km-anonymity, to limit the effects of the data dimensionality and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm which finds the optimal solution, however, at a high cost which makes it inapplicable for large, realistic problems. Then, we propose two greedy heuristics, which scale much better and in most of the cases find a solution close to the optimal. The proposed algorithms are experimentally evaluated using real datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Gagan Aggarwal , Tomás Feder , Krishnaram Kenthapadi , Samir Khuller , Rina Panigrahy , Dilys Thomas , An Zhu, Achieving anonymity via clustering, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
[doi> 10.1145/1142351.1142374]
|
| |
2
|
G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation Algorithms for k-Anonymity. Journal of Privacy Technology, (Paper number: 20051120001), 2005.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
G. Ghinita, Y. Tao, and P. Kalnis. On the Anonymization of Sparse High-Dimensional Data. In Proc. of ICDE, 2008.
|
 |
7
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
N. Li, T. Li, and S. Venkatasubramanian. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In Proc. of ICDE, pages 106--115, 2007.
|
| |
12
|
|
 |
13
|
|
| |
14
|
M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. Technical Report CSD TR 08-002.
|
| |
15
|
M. Nergiz, C. Clifton, and A. Nergiz. Multirelational k-anonymity. In Proc. of ICDE, pages 1417--1421, 2007.
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Jian Xu , Wei Wang , Jian Pei , Xiaoyuan Wang , Baile Shi , Ada Wai-Chee Fu, Utility-based anonymization using local recoding, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150504]
|
| |
23
|
Q. Zhang, N. Koudas, D. Srivastava, and T. Yu. Aggregate Query Answering on Anonymized Tables. In Proc. of ICDE, pages 116--125, 2007.
|
 |
24
|
|
CITED BY 3
|
|
|
|
|
|
|
|
Graham Cormode , Divesh Srivastava, Anonymized data: generation, models, usage, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|