|
ABSTRACT
Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What we need is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the non-disclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from the data distributed at multiple parties, without disclosing the data of each party to others. We assume that data is horizontally partitioned -- each party collects the same features of information for different data objects. We quantify the security and efficiency of the proposed method, and highlight future challenges.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
SPECT dataset. ftp://ftp/ics.uci.edu/pub/machine-learning-databases/spect/.
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Directive 95/46/EC of the european parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official Journal of the European Communities, No I.(281):31--50, Oct. 24 1995.
|
 |
9
|
Alexandre Evfimievski , Ramakrishnan Srikant , Rakesh Agrawal , Johannes Gehrke, Privacy preserving mining of association rules, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775080]
|
 |
10
|
|
| |
11
|
X. Ge. C++ code: SMO training of SVM. http://www.datalab.uci.edu/people/xge/svm/, 2001.
|
| |
12
|
B. Goethals, S. Laur, H. Lipmaa, and T. Mielikäinen. On Secure Scalar Product Computation for Privacy-Preserving Data Mining. In C. Park and S. Chee, editors, The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), volume 3506, pages 104--120, December 2--3, 2004.
|
| |
13
|
Standard for privacy of individually identifiable health information. Federal Register, 66(40), Feb. 28 2001.
|
| |
14
|
|
| |
15
|
X. Jiang and H. Yu. SVM-JAVA: A Java implementation of the SMO (sequential minimal optimization) for training SVM. Computer Science Department, University of Iowa, http://hwanjoyu.org/svm-java, 2005.
|
| |
16
|
|
| |
17
|
|
| |
18
|
L. A. Kurgan, K. J. Cios, R. Tadeusiewicz, M. Ogiela, and L. S. Goodenday. Knowledge discovery approach to automated cardiac spect diagnosis. Artificial Intelligence in Medicine, 23:2:149--169, 2001.
|
| |
19
|
|
| |
20
|
|
| |
21
|
Y. Lindell and B. Pinkas. Privacy preserving data mining. Journal of Cryptology, 15(3):177--206, 2002.
|
| |
22
|
|
| |
23
|
|
| |
24
|
P. Ravikumar, W. W. Cohen, and S. E. Fienberg. A secure protocol for computing string distance metrics. In Proc. the Workshop on Privacy and Security Aspects of Data Mining at the Int. Conf. on Data Mining, 2004.
|
| |
25
|
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of 28th International Conference on Very Large Data Bases, pages 682--693, Hong Kong, Aug. 20--23 2002. VLDB.
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In 2004 SIAM International Conference on Data Mining, pages 522--526, 2004.
|
| |
30
|
J. Vaidya and C. Clifton. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 2005.
|
| |
31
|
J. Vaidya and C. Clifton. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 13(4), Nov. 2005.
|
| |
32
|
V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.
|
 |
33
|
|
 |
34
|
|
| |
35
|
H. Yu, K. C. Chang, and J. Han. Heterogeneous learner for Web page classification. In Int. Conf. Data Mining (ICDM'2), 2002.
|
| |
36
|
H. Yu and J. Vaidya. Privacy-preserving linear SVM classification. Submitted for publication, 2005.
|
| |
37
|
H. Yu, J. Vaidya, and X. Jiang. Privacy preserving svm classification on vertically partitioned data. Submitted for publication, 2005.
|
CITED BY 7
|
|
|
|
|
|
|
|
Li Wan , Wee Keong Ng , Shuguo Han , Vincent C. S. Lee, Privacy-preservation for gradient descent methods, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Shipeng Yu , Glenn Fung , Romer Rosales , Sriram Krishnan , R. Bharat Rao , Cary Dehing-Oberije , Philippe Lambin, Privacy-preserving cox regression for survival analysis, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|