|
ABSTRACT
This paper considers the problem of providing security to statistical databases against disclosure of confidential information. Security-control methods suggested in the literature are classified into four general approaches: conceptual, query restriction, data perturbation, and output perturbation.
Criteria for evaluating the performance of the various security-control methods are identified. Security-control methods that are based on each of the four approaches are discussed, together with their performance with respect to the identified evaluation criteria. A detailed comparative analysis of the most promising methods for protecting dynamic-online statistical databases is also presented.
To date no single security-control method prevents both exact and partial disclosures. There are, however, a few perturbation-based methods that prevent exact disclosure and enable the database administrator to exercise "statistical disclosure control." Some of these methods, however introduce bias into query responses or suffer from the 0/1 query-set-size problem (i.e., partial disclosure is possible in case of null query set or a query set of size 1).
We recommend directing future research efforts toward developing new methods that prevent exact disclosure and provide statistical-disclosure control, while at the same time do not suffer from the bias problem and the 0/1 query-set-size problem. Furthermore, efforts directed toward developing a bias-correction mechanism and solving the general problem of small query-set-size would help salvage a few of the current perturbation-based methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ABUL-ELA, A.-L., GREENBERG, B. G., AND HORViTZ, D. G. 1967. A multi-proportions randomized response model. J. Am. Stat. Assoc. 62, 319 (Sept.), 990-1008.
|
| |
2
|
ACHUGBUE, J. O., AND CHIN, F. Y. 1979. The effectiveness of output modification by rounding for protection of statistical databases. INFOR 17, 3 (Aug.), 209-218.
|
 |
3
|
|
 |
4
|
|
| |
5
|
CHIN, F. Y., KOSSOWSKI, P., AND LOH, S. C. 1984. Efficient inference control for range sum queries. TheoL. Comput. Sci. 32, 77-86.
|
| |
6
|
CHIN, F. Y., AND C)ZSOYOC, LU, G. 1982. Auditing and inference control in statistical databases. IEEE Trans. Softw. Eng. SE-8, 6 (Apr.), 574-582.
|
 |
7
|
|
| |
8
|
CHIN, F. Y., AND 0ZSOYO~,LU, G. 1979. Security in partitioned dynamic statistical databases. In Proceedings of the iEEE COMPSAC, pp. 594-601.
|
| |
9
|
Cox, L. H. 1980. Suppression methodology and statistical disclosure control. J. Am. Star. Assoc. 75, 370 (June), 377-385.
|
| |
10
|
DALENIUS, T. 1981. A simple procedure for controlled rounding. Statistik Tidskrift 3, 202-208.
|
| |
11
|
DALENIUS, T. 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429-444.
|
| |
12
|
DALENIUS, T. 1974. The invasion of privacy problem and statistics production. An overview. Statistik Tidskrift 12, 213-225.
|
| |
13
|
DENNING, D. E. 1985. Commutative filters for reducing inference threats in multilevel database systems. In Proceedings of the 1985 Symposium on Security and Privacy, IEEE Computer Society, pp. 134-146.
|
| |
14
|
DENNING, D. E. 1984. Cryptographic check-sums for multilevel database security. In Proceedings of the 1984 Symposium on Security and Privacy, IEEE Computer Society, pp. 52-61.
|
| |
15
|
DENNING, D. r. 1983. A security model for the statistical database problem. In Proceedings of the 2nd International Workshop on Management, pp. 1-16.
|
| |
16
|
|
| |
17
|
DENNING, D. E. 1981. Restricting queries that might lead to compromise. In Proceedings of IEEE Symposium on Security and Privacy (Apr.), pp. 33-40.
|
 |
18
|
|
| |
19
|
DENNING, D. E., AND SCHLORER, J. 1983. Inference control for statistical databases. Computer 16, 7 (July), 69-82.
|
 |
20
|
|
| |
21
|
DENNING, D. E., SCHLORER, J., AND WEHRLE, E. 1982. Memoryless inference controls for statistical databases. Computer Science Dept., Purdue Univ.
|
 |
22
|
|
 |
23
|
|
| |
24
|
FELLEGI, I. r. 1972. On the question of statistical confidentiality. J. Am. Stat. Assoc. 67, 337 (Mar.), 7-18.
|
| |
25
|
FELLEGI, I. P., AND PHILLIPS, J. r. 1974. Statistical confidentiality: Some theory and applications to data dissemination. Ann. Ec. Soc. MeaN. 3, 2 (Apr.), 399-409.
|
| |
26
|
FRIEDMAN, A. D., AND HOFFMAN, L. J. 1980. Towards a fail-safe approach to secure databases. In Proceedings of IEEE Symposium on Security and Privacy (Apr.).
|
| |
27
|
|
| |
28
|
GHOSU, S. P. 1985. An application of statistical databases in manufacturing testing. IEEE Trans. Softw. Eng. SE-11, 7, 591-596.
|
| |
29
|
|
| |
30
|
GREENBERG, B. G., ABERNATHY, J. R., AND HORVITZ, D. G. 1969a. Application of randomized response technique in obtaining quantitative data. In Proceedings of Social Statistics Section, America, Statistical Association, (Aug.), 40-43.
|
| |
31
|
GREENBERG, B. G., ABUL-ELA, A.-L., SIMMONS, W. R., AND HORVITZ, U. G. 1969b. The unrelated question randomized response model: Theoretical framework. J. Am. Star. Assoc. 64, 326 (June), 520-539.
|
| |
32
|
HAQ, M. I. UL. 1977. On safeguarding statistical disclosure by giving approximate answers to queries. In Proceedings of International Computer Symposium (North-Holland), pp. 491-495.
|
| |
33
|
HAQ, M. I. UL. 1975. Insuring individual's privacy from statistical database users. In Proceedings of National Computer Conference (Montvale, N.J.), vol. 44. AFIPS Press, Arlington, Va., pp. 941-946.
|
| |
34
|
HOFFMAN, L. J. 1977. Modern Methods for Computer Security and Privacy. Prentice-Hall, Englewood Cliffs, N.J.
|
| |
35
|
HOFFMAN, L. J., AND MILLER, W. F. 1970. Getting a personal dossier from a statistical data bank. Datarnation 16, 5 (May), 74-75.
|
 |
36
|
|
 |
37
|
|
| |
38
|
|
| |
39
|
|
 |
40
|
|
| |
41
|
MATLOFr, N. E. 1986. Another look at the use of noise addition for database security. In Proceedings of IEEE Symposium on Security and Privacy, pp. 173-180.
|
| |
42
|
|
| |
43
|
MILLER, A. R. 1971. The Assault on Privacy-Com~ puters, Data Banks and Dossiers. University of Michigan Press, Ann Arbor, Mich.
|
 |
44
|
|
| |
45
|
0ZSOYOGLU, G., AND CHIN, F. Y. 1982. Enhancing the security of statistical databases with a ques* tion-answering system and a kernel design. IEEE Trans. Softw. Eng. SE-8, 3, 223-234.
|
| |
46
|
|
| |
47
|
|
| |
48
|
0ZSOYOGLU, G., AND Su, T. A. 1985. Rounding and inference control in conceptual models for statistical databases. In Proceedings of IEEE Symposium on Security and Privacy, pp. 160-173.
|
| |
49
|
|
 |
50
|
|
| |
51
|
REISS, J. P. 1980. Practical data-swapping: The first steps. In Proceedings of IEEE Symposium on Security and Privacy, pp. 36-44.
|
 |
52
|
|
| |
53
|
|
| |
54
|
|
| |
55
|
SCHLORER, J. 1983. Information loss in partitioned statistical databases. Comput. J. 26, 3, 218-223.
|
 |
56
|
|
 |
57
|
|
| |
58
|
SCHLORER, J. 1976. Confidentiality of statistical records: A threat monitoring scheme of on-line dialogue. Methods Inform. Med. 15, 1, 36-42.
|
| |
59
|
SCHLORER, J. 1975. Identification and retrieval of personal records from a statistical data bank. Methods Info. Med. 14, i, 7-13.
|
 |
60
|
|
| |
61
|
Su, T., AND 0ZSOYOS, LU, G. 1987. Data dependencies and inference control in multilevel relational database systems. In Proceedings of the 1987 Symposium on Security and Privacy, IEEE Computer Society, pp. 202-211.
|
| |
62
|
TENDICK, P., AND MATLOFr, N. S. 1987. Recent results on the noise addition method for database security. Presented at the Joint ASA/IMS Statis~ tical Meetings, San Francisco.
|
 |
63
|
|
| |
64
|
TRUEBLOOD, R. P. 1984. Security issues in knowledge systems. In Proceedings of I st International Workshop on Expert Database Systems, vol. 2, pp. 834-840.
|
| |
65
|
TURN, R., AND SHAPIRO, N. Z. 1978. Privacy and security in databank systems: Measure of effectiveness, costs, and protector-intruder interactions. Computers and Security, C. T. Dinardo, Ed. AFIPS Press, Arlington, Va., pp. 49-57.
|
| |
66
|
WARNER, S. L. 1971. The linear randomized response model. J. Am. Star. Assoc. 66, 336 (Dec.), 884-888.
|
| |
67
|
WARNER, S. L. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60, 309 (Mar.), 63-69.
|
 |
68
|
|
CITED BY 135
|
|
|
|
|
|
|
|
|
|
|
Ran Canetti , Yuval Ishai , Ravi Kumar , Michael K. Reiter , Ronitt Rubinfeld , Rebecca N. Wright, Selective private function evaluation with applications to private statistics, Proceedings of the twentieth annual ACM symposium on Principles of distributed computing, p.293-304, August 2001, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jon Kleinberg , Christos Papadimitriou , Prabhakar Raghavan, Auditing Boolean attributes, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.86-91, May 15-18, 2000, Dallas, Texas, United States
|
|
|
|
|
|
F. M. Malvestuto , M. Moscarini , M. Rafanelli, Suppressing marginal cells to protect sensitive information in a two-dimensional statistical table (extended abstract), Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.252-258, May 29-31, 1991, Denver, Colorado, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chris Clifton , Murat Kantarcioǧlu , AnHai Doan , Gunther Schadow , Jaideep Vaidya , Ahmed Elmagarmid , Dan Suciu, Privacy-preserving data integration and sharing, Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, June 13, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Boštjan Brumen , Tatjana Welzer , Marjan Družovec , Izidor Golob , Hannu Jaakkola , Ivan Rozman , Jiři Kubalik, Protecting medical data for decision-making analyses, Journal of Medical Systems, v.29 n.1, p.65-80, February 2005
|
|
|
|
|
|
|
|
|
|
|
|
Md. Zahidul Islam , Ljiljana Brankovic, A framework for privacy preserving classification in data mining, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.163-168, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nicola Zannone , Sushil Jajodia , Fabio Massacci , Duminda Wijesekera, Maintaining privacy on derived objects, Proceedings of the 2005 ACM workshop on Privacy in the electronic society, November 07-07, 2005, Alexandria, VA, USA
|
|
|
|
|
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ross Sparks , Chris Carter , John B. Donnelly , Christine M. O'Keefe , Jodie Duncan , Tim Keighley , Damien McAullay, Remote access methods for exploratory data analysis and statistical modelling: Privacy-Preserving Analytics®, Computer Methods and Programs in Biomedicine, v.91 n.3, p.208-222, September, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shubha U. Nabar , Bhaskara Marthi , Krishnaram Kenthapadi , Nina Mishra , Rajeev Motwani, Towards robustness in query auditing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Aggarwal , M. Bawa , P. Ganesan , H. Garcia-Molina , K. Kenthapadi , N. Mishra , R. Motwani , U. Srivastava , D. Thomas , J. Widom , Y. Xu, Vision paper: enabling privacy for the paranoids, Proceedings of the Thirtieth international conference on Very large data bases, p.708-719, August 31-September 03, 2004, Toronto, Canada
|
|
|
Kristen LeFevre , Rakesh Agrawal , Vuk Ercegovac , Raghu Ramakrishnan , Yirong Xu , David DeWitt, Limiting disclosure in hippocratic databases, Proceedings of the Thirtieth international conference on Very large data bases, p.108-119, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
Mayank Bawa , Roberto J. Bayardo, Jr. , Rakesh Agrawal, Privacy-preserving indexing of documents on the network, Proceedings of the 29th international conference on Very large data bases, p.922-933, September 09-12, 2003, Berlin, Germany
|
|
|
Rakesh Agrawal , Roberto Bayardo , Christos Faloutsos , Jerry Kiernan , Ralf Rantzau , Ramakrishnan Srikant, Auditing compliance with a Hippocratic database, Proceedings of the Thirtieth international conference on Very large data bases, p.516-527, August 31-September 03, 2004, Toronto, Canada
|
|
|
Rakesh Agrawal , Jerry Kiernan , Ramakrishnan Srikant , Yirong Xu, Hippocratic databases, Proceedings of the 28th international conference on Very Large Data Bases, p.143-154, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xu Huang , Allan C. Madoc , Dharmendra Sharma, On-line data protecting via pseudo random binary sequences, Proceedings of the 2007 annual Conference on International Conference on Computer Engineering and Applications, p.193-199, January 17-19, 2007, Gold Coast, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Boštjan Brumen , Izidor Golob , Tatjana Welzer , Marjan Družovec , Ivan Rozman , Hannu Jaakkola, An Algorithm for Protecting Knowledge Discovery Data, Informatica, v.14 n.3, p.277-288, August 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Mary McLeish : Reviewer"
A statistical database (SDB) is any traditional database system in
which queries are restricted to statistical aggregates (such as sample
mean and count); an example is the US Census Bureau database. It is
often required that the system be sec
more...
|