ACM Home Page
Please provide us with feedback. Feedback
Privacy-preserving cox regression for survival analysis
Full text PdfPdf (406 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Industrial papers table of contents
Pages 1034-1042  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Shipeng Yu  Siemens Medical Solutions USA, Inc., Malvern, PA, USA
Glenn Fung  Siemens Medical Solutions USA, Inc., Malvern, PA, USA
Romer Rosales  Siemens Medical Solutions USA, Inc., Malvern, PA, USA
Sriram Krishnan  Siemens Medical Solutions USA, Inc., Malvern, PA, USA
R. Bharat Rao  Siemens Medical Solutions USA, Inc., Malvern, PA, USA
Cary Dehing-Oberije  University Hospital Maastricht, Maastricht, Netherlands
Philippe Lambin  University Hospital Maastricht, Maastricht, Netherlands
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 195,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1402013
What is a DOI?

ABSTRACT

Privacy-preserving data mining (PPDM) is an emergent research area that addresses the incorporation of privacy preserving concerns to data mining techniques. In this paper we propose a privacy-preserving (PP) Cox model for survival analysis, and consider a real clinical setting where the data is horizontally distributed among different institutions. The proposed model is based on linearly projecting the data to a lower dimensional space through an optimal mapping obtained by solving a linear programming problem. Our approach differs from the commonly used random projection approach since it instead finds a projection that is optimal at preserving the properties of the data that are important for the specific problem at hand. Since our proposed approach produces an sparse mapping, it also generates a PP mapping that not only projects the data to a lower dimensional space but it also depends on a smaller subset of the original features (it provides explicit feature selection). Real data from several European healthcare institutions are used to test our model for survival prediction of non-small-cell lung cancer patients. These results are also confirmed using publicly available benchmark datasets. Our experimental results show that we are able to achieve a near-optimal performance without directly sharing the data across different data sources. This model makes it possible to conduct large-scale multi-centric survival analysis without violating privacy-preserving requirements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
D. R. Cox. Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B 34:187--220, 1972.
 
4
D. R. Cox and D. Oakes. Analysis of Survival Data. Chapman and Hall, 1984.
 
5
C. Dehing-Oberije, D. D. Ruysscher, H. van der Weide, and et al. Tumor volume combined with number of positive lymph node stations is a more important prognostic factor than tnm stage for survival of non-small-cell lung cancer patients treated with (chemo)radiotherapy. Int J Radiat Oncol Biol Phys, in press.
 
6
W. Du, Y. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 222--233, 2004. http://citeseer.ist.psu.edu/du04privacypreserving.html.
 
7
 
8
W. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemp. Math, 26:189--206, 1984.
 
9
W. Knaus, F. E. Harrell, J. Lynn, et al. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine, 122:191--203, 1995.
10
 
11
H. Li and Y. Luan. Kernel Cox Regression Models for Linking Gene Expression Profiles to Censored Survival Data. In Pacific Symposium on Biocomputing 8, pages 65--76, 2003.
 
12
 
13
L. Liu, J. Wang, Z. Lin, and J. Zhang. Wavelet-based data distortion for privacy-preserving collaborative analysis. Technical Report 482-07, Department of Computer Science, University of Kentucky, Lexington, KY 40506, 2007. riptsize http://www.cs.uky.edu/ jzhang/pub/MINING/lianliu1.pdf.
 
14
O. L. Mangasarian and T. Wild. Privacy-preserving classification of horizontally partitioned data via random kernels. Technical Report 07-02, Computer sciences department, university of Wisconsin - Madison, Madison, WI, 2007.
 
15
S. R. M. Oliveira and O. R. Zaïane. Privacy preservation when sharing data for clustering. In Proceedings of the International Workshop on Secure Data Management in a Connected World, pages 67--82, Toronto, Canada, August 2004.
16
 
17
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.
18
 
19
20


Collaborative Colleagues:
Shipeng Yu: colleagues
Glenn Fung: colleagues
Romer Rosales: colleagues
Sriram Krishnan: colleagues
R. Bharat Rao: colleagues
Cary Dehing-Oberije: colleagues
Philippe Lambin: colleagues