|
ABSTRACT
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
AC99
|
|
| |
AGI+92
|
|
| |
Agr99
|
Rakesh Agrawal. Data Mining: Crossing the Chasm. In 5th Int'l Con}erence on Knowledge Discovery in Databases and Data Mining, San Diego, California, August 1999. Available from http ://www. almaden, ibm. eom/cs/quese / papers/kdd99_chasm, pp#.
|
 |
AW89
|
|
| |
BDF+97
|
D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioatmidis, it. V. Jagadish, T. Johnson, R.Ng, V. Poosala, and K. Sevcik. The New Jersey Data Reduction Report. Data Bngrg. Bull., 20:3-45, Dec. 1997.
|
 |
Bec80
|
|
 |
Ben99
|
|
| |
BFOS84
|
L. Breiman, J. H, Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.
|
 |
BS97
|
|
| |
CM96
|
C. Clifton and D. Marks. Security and privacy implications of data mining. In ACId SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 15-19, May 1996.
|
| |
CO82
|
F.Y. Chin and G. O#soyoglu. Auditing and infrence control in statistical databases. IEBE Trans. Sof~w. Eng., SE-8(6):113-139, April 1982.
|
| |
Cox80
|
L.H. Cox. Suppression methodology and statistical disclosure control, or. Am. Star. Assoc., 75(370):377-395, April 1980.
|
| |
Cra46
|
H. Cramer. Mathematical Methods o{ Statistics. Princeton University Press, 1946.
|
| |
CRA99a
|
L.F. Cranor, J. Reagle, and M.S. Ackerman. Beyond concern: Understanding net users' attitudes about online privacy. Technical Report TR 99.4.3, AT&T Labs-Research, April 1999. Available from http://www, research.art, cam/ library/trs/TRs/99/99.4/99.4.3/report, him.
|
 |
Cra99b
|
|
 |
CS76
|
|
 |
DDS79
|
|
 |
Den80
|
|
| |
Den82
|
|
| |
Din78
|
|
 |
DJL79
|
|
| |
ECB99
|
|
| |
Eco99
|
The Economist. The End of Privacy, May 1999.
|
| |
EHN96
|
H.W. Engl, M. Hanke, and A. Neubaue. Regularization of Inverse Problems. Kluwer, 1996.
|
| |
eu998
|
The European Union's Directive on Privacy Protection, October 1998. Available from hetp: I/.... echo. lu/l egal/en/dat aprot/ dSrectiv/direct iv. html.
|
| |
Fel72
|
I.P. FeUegi. On the question of statistical confidentiality2# I. Am. Star. Assoc., 67(337):7- 18, March 1972.
|
| |
Fis63
|
Marek Fisz. Probability #heory and Mathematical Statistics. Wiley, 1963:
|
| |
FJS97
|
|
| |
GWB97
|
|
| |
HE98
|
C. Hine and J. Eve. Privacy in 'the marketplace. The ln:ormation Society, L42(2):#6-59, 1998.
|
| |
HS99
|
John Hagel and Moxc Singer. Net Worth. Harvard Business School Press, 1999.
|
 |
LCL85
|
|
 |
LEW99
|
|
| |
LM99
|
J.B. Lotspiech and R.J.T. Morris. Method and system for client/server communications with user information revealed as a function of willingness to reveal and whether the information is required. U.S. Patent No. 5913030, June 1999.
|
| |
LST83
|
|
| |
MAR96
|
|
| |
MST94
|
|
| |
Off98
|
Office of the Information and Privacy Commissioner, Ontario. Data Mining: Staking a Claim or, Your Privacy, January 1998. Available from http:{/,w,.ipc,on.ca/ web.#ite, eng/mat t ers / s ttm#pap /papers { dat amine .htm.
|
 |
Opp97
|
|
| |
Qui93
|
|
 |
Rei84
|
|
| |
RG98
|
|
| |
SAM96
|
|
| |
Sho82
|
|
| |
ST90
|
|
| |
The98
|
Kurt Thearling. Data mining and privacy: A conflict in making. DS*, March 1998.
|
| |
Tim97
|
Time. The Death of Privacy, August 1997.
|
 |
TYW84
|
|
| |
War65
|
S.L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Star. Assoc., 60(309):63-69, March 1965.
|
| |
Wes98a
|
A.F. Westin. E-commerce and privacy: What net uzers want. Technical report, Louis Harris & Associates, June 1998. Available from http ://www. pri racy ex change, org/iss/ surveys / ec ommsum, html.
|
| |
Wes98b
|
A.F. Westin. Priwcy concerns & consumer choice. Technical report, Louis Harris & Associates, Dec. 1998. Available from http ://www. privacyexchange, org/iss/ surveys/1298#oc, html.
|
| |
Wes99
|
A.F. Westin. Freebies and privacy: What net users think. Technical report, Opinion Research Corporation, July 1999. Available from http : //www. privacyexahange, org/iss/ surveys/st990714, html.
|
| |
Wor
|
The World Wide Web Consortium. The Plat}orm for Privacy Preference (P3P). Available from http: //www. w3. org/P3P/P3FAQ, html.
|
 |
YC77
|
|
CITED BY 233
|
|
W. K. Wong , David W. Cheung , Edward Hung , Ben Kao , Nikos Mamoulis, Security in outsourcing of association rule mining, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
James Bo Begole , John C. Tang , Rosco Hill, Rhythm modeling, visualizations and applications, Proceedings of the 16th annual ACM symposium on User interface software and technology, p.11-20, November 02-05, 2003, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Claus Boyens , Oliver Günther , Maximilian Teltzrow, Privacy conflicts in CRM services for online shops: a case study, Proceedings of the IEEE international conference on Privacy, security and data mining, p.27-35, December 01, 2002, Maebashi City, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Md. Zahidul Islam , Ljiljana Brankovic, A framework for privacy preserving classification in data mining, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.163-168, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
Gunther Schadow , Shaun J. Grannis , Clement J. McDonald, Discussion paper: privacy-preserving distributed queries for a clinical case research network, Proceedings of the IEEE international conference on Privacy, security and data mining, p.55-65, December 01, 2002, Maebashi City, Japan
|
|
|
Vladimir Estivill-Castro , Chris Clifton, Preface: proceedings of the ICDM 2002 workshop on privacy, security, and data mining, Proceedings of the IEEE international conference on Privacy, security and data mining, p..1, December 01, 2002, Maebashi City, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Boštjan Brumen , Tatjana Welzer , Marjan Družovec , Izidor Golob , Hannu Jaakkola , Ivan Rozman , Jiři Kubalik, Protecting medical data for decision-making analyses, Journal of Medical Systems, v.29 n.1, p.65-80, February 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
|
|
|
Raymond Chi-Wing , Jiuyong Li , Ada Wai-Chee Fu , Ke Wang, (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Arjun Dasgupta , Nan Zhang , Gautam Das , Surajit Chaudhuri, Privacy preservation of aggregates in hidden databases: why and how?, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Joan Feigenbaum , Yuval Ishai , Tal Malkin , Kobbi Nissim , Martin J. Strauss , Rebecca N. Wright, Secure multiparty computation of approximations, ACM Transactions on Algorithms (TALG), v.2 n.3, p.435-472, July 2006
|
|
|
|
|
|
|
|
|
|
|
|
Shubha U. Nabar , Bhaskara Marthi , Krishnaram Kenthapadi , Nina Mishra , Rajeev Motwani, Towards robustness in query auditing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
Dan Frankowski , Dan Cosley , Shilad Sen , Loren Terveen , John Riedl, You are what you say: privacy risks of public mentions, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Sheng Zhang , James Ford , Fillia Makedon, A privacy-preserving collaborative filtering scheme with two-way communication, Proceedings of the 7th ACM conference on Electronic commerce, p.316-323, June 11-15, 2006, Ann Arbor, Michigan, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lars Backstrom , Cynthia Dwork , Jon Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
Boaz Barak , Kamalika Chaudhuri , Cynthia Dwork , Satyen Kale , Frank McSherry , Kunal Talwar, Privacy, accuracy, and consistency too: a holistic solution to contingency table release, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
|
|
|
Marco Gruteser , Graham Schelle , Ashish Jain , Rick Han , Dirk Grunwald, Privacy-aware location sensor networks, Proceedings of the 9th conference on Hot Topics in Operating Systems, p.28-28, May 18-21, 2003, Lihue, Hawaii
|
|
|
|
|
|
Tarek Abdelzaher , Yaw Anokwa , Peter Boda , Jeff Burke , Deborah Estrin , Leonidas Guibas , Aman Kansal , Samuel Madden , Jim Reich, Mobiscopes for Human Spaces, IEEE Pervasive Computing, v.6 n.2, p.20-29, April 2007
|
|
|
Min Mun , Sasank Reddy , Katie Shilton , Nathan Yau , Jeff Burke , Deborah Estrin , Mark Hansen , Eric Howard , Ruth West , Péter Boda, PEIR, the personal environmental impact report, as a platform for participatory sensing systems research, Proceedings of the 7th international conference on Mobile systems, applications, and services, June 22-25, 2009, Kraków, Poland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Aggarwal , M. Bawa , P. Ganesan , H. Garcia-Molina , K. Kenthapadi , N. Mishra , R. Motwani , U. Srivastava , D. Thomas , J. Widom , Y. Xu, Vision paper: enabling privacy for the paranoids, Proceedings of the Thirtieth international conference on Very large data bases, p.708-719, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yabo Xu , Ke Wang , Benyu Zhang , Zheng Chen, Privacy-enhancing personalized web search, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Feifei Li , Marios Hadjieleftheriou , George Kollios , Leonid Reyzin, Dynamic authenticated index structures for outsourced databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ali İnan , Selim V. Kaya , Yücel Saygın , Erkay Savaş , Ayça A. Hintoğlu , Albert Levi, Privacy preserving clustering on horizontally partitioned data, Data & Knowledge Engineering, v.63 n.3, p.646-666, December, 2007
|
|
|
|
|
|
Jon M. Kleinberg, Challenges in mining social network data: processes, privacy, and paradoxes, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, p.4-5, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
Zekeriya Erkin , Alessandro Piva , Stefan Katzenbeisser , R. L. Lagendijk , Jamshid Shokrollahi , Gregory Neven , Mauro Barni, Protection and retrieval of encrypted multimedia content: when cryptography meets signal processing, EURASIP Journal on Information Security, v.7 n.2, p.1-20, January 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mahir Can Doganay , Thomas B. Pedersen , Yücel Saygin , Erkay Savaş , Albert Levi, Distributed privacy preserving k-means clustering with additive secret sharing, Proceedings of the 2008 international workshop on Privacy and anonymity in information society, March 29-29, 2008, Nantes, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Justin Brickell , Donald E. Porter , Vitaly Shmatikov , Emmett Witchel, Privacy-preserving remote diagnostics, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
|
|
|
Baik Hoh , Marco Gruteser , Hui Xiong , Ansaf Alrabady, Preserving privacy in gps traces via uncertainty-aware path cloaking, Proceedings of the 14th ACM conference on Computer and communications security, October 28-31, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Xiaoyun He , Basit Shafiq , Jaideep Vaidya , Nabil Adam, Privacy-preserving link discovery, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
Haixun Wang , Jian Yin , Chang-shing Perng , Philip S. Yu, Dual encryption for query integrity assurance, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Boštjan Brumen , Izidor Golob , Tatjana Welzer , Marjan Družovec , Ivan Rozman , Hannu Jaakkola, An Algorithm for Protecting Knowledge Discovery Data, Informatica, v.14 n.3, p.277-288, August 2003
|
|
|
|
|
|
|
|
|
P. L. Bradshaw , K. W. Brannon , T. Clark , K. Dahman , S. Doraiswamy , L. Duyanovich , B. L. Hillsberg , W. Hineman , M. Kaczmarski , B. J. Klingenberg , X. Ma , R. Rees, Archive storage system design for long-term storage of massive amounts of data, IBM Journal of Research and Development, v.52 n.4, p.379-388, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baik Hoh , Marco Gruteser , Ryan Herring , Jeff Ban , Daniel Work , Juan-Carlos Herrera , Alexandre M. Bayen , Murali Annavaram , Quinn Jacobson, Virtual trip lines for distributed privacy-preserving traffic monitoring, Proceeding of the 6th international conference on Mobile systems, applications, and services, June 17-20, 2008, Breckenridge, CO, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Travis Kriplean , Evan Welbourne , Nodira Khoussainova , Vibhor Rastogi , Magdalena Balazinska , Gaetano Borriello , Tadayoshi Kohno , Dan Suciu, Physical Access Control for Captured RFID Data, IEEE Pervasive Computing, v.6 n.4, p.48-55, October 2007
|
|
|
|
|
|
Shipeng Yu , Glenn Fung , Romer Rosales , Sriram Krishnan , R. Bharat Rao , Cary Dehing-Oberije , Philippe Lambin, Privacy-preserving cox regression for survival analysis, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
Magdalena Balazinska , Amol Deshpande , Michael J. Franklin , Phillip B. Gibbons , Jim Gray , Mark Hansen , Michael Liebhold , Suman Nath , Alexander Szalay , Vincent Tao, Data Management in the Worldwide Sensor Web, IEEE Pervasive Computing, v.6 n.2, p.30-40, April 2007
|
|
|
|
|
|
|
|
|
|
|
|
Raghu K. Ganti , Nam Pham , Yu-En Tsai , Tarek F. Abdelzaher, PoolView: stream privacy for grassroots participatory sensing, Proceedings of the 6th ACM conference on Embedded network sensor systems, November 05-07, 2008, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Faris Alqadah , Zhen Hu , Lawrence J. Mazlack, Vertical mining with incomplete data, Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems, p.380-385, October 26-28, 2008, Corfu, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yu Fu , A. Güneş Koru , Zhiyuan Chen , Khaled El Emam, A tree-based approach to preserve the privacy of software engineering data and predictive models, Proceedings of the 5th International Conference on Predictor Models in Software Engineering, May 18-19, 2009, Vancouver, British Columbia, Canada
|
|
|
Ke Wang , Yabo Xu , Rong She , Philip S. Yu, Classification spanning private databases, Proceedings of the 21st national conference on Artificial intelligence, p.293-298, July 16-20, 2006, Boston, Massachusetts
|
|
|
Azami Zaharim , Shahrum Abdullah , Mohammad Darahim Ibrahim , Zulkifli Mohd Nopiah, Genetic algorithm in time series fatigue analysis, Proceedings of the 13th WSEAS international conference on Applied mathematics, p.241-245, December 15-17, 2008, Puerto De La Cruz, Spain
|
|
|
Wai Kit Wong , David Wai-lok Cheung , Ben Kao , Nikos Mamoulis, Secure kNN computation on encrypted databases, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|