ACM Home Page
Please provide us with feedback. Feedback
Risk prediction and risk factors identification from imbalanced data with RPMBGA+
Full text PdfPdf (173 KB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation table of contents
Atlanta, GA, USA
SESSION: Late-breaking papers table of contents
Pages 2193-2198  
Year of Publication: 2008
ISBN:978-1-60558-131-6
Authors
Topon K. Paul  Toshiba Corporation, Kanagawa, Japan
Ken Ueno  Toshiba Corporation, Kanagawa, Japan
Koichiro Iwata  Toshiba Corporation, Tokyo, Japan
Toshio Hayashi  Toshiba Corporation, Tokyo, Japan
Nobuyoshi Honda  Toshiba Corporation, Tokyo, Japan
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 53,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1388969.1389046
What is a DOI?

ABSTRACT

In this paper, we propose a new method to predict the risk of an event very accurately from imbalanced data in which the number of instances of the majority class is very larger than that of the minority class and to identify the features that are relevant for the target risk factor. To solve the trade-off between the prediction rates of the majority and the minority classes, three input parameters are used, which supply the costs of misclassification of an instance from the majority and the minority classes or the sensitivity threshold of the minority class. To get relevant features and to utilize the prior information about the relationship of a feature with the target risk factor, a probabilistic model building genetic algorithm called RPMBGA+ is employed. By applying the proposed technique to the health checkup and lifestyle data of Toshiba Corporation, we have found that the proposed method improves the sensitivity of the minority class and selects a very small number of informative features.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
B. Dasarathy. Nearest Neighbor(NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, 1991.
4
 
5
L. Eshelman. The CHC adaptive search algorithm. In Foundations of Genetic Algorithms I, pages 265--283. Morgan Kauffman, San Mateo CA, 1991.
 
6
 
7
 
8
M. Kubat and S. Matwin. Addressing the curse of imbalanced data sets: One-sided sampling. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 179--186, 1997.
 
9
C. Ling and C. Li. Data mining for direct marketing: Problems and solutions. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 73--79, 1998.
 
10
T. K. Paul and H. Iba. Identification of informative genes for molecular classification using probabilistic model building genetic algorithm. In Proceedings of Genetic and Evolutionary Computation Conference 2004, pages 414--425. 2004.
11
 
12
T. K. Paul and H. Iba. Gene selection for classification of cancers using probabilistic model building genetic algorithm. BioSystems, 82(3):208--225, 2005.
 
13
T. K. Paul and H. Iba. Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 23 August 2007. Preprint on IEEE Computer Society Digital Library. IEEE Computer Society, 11 April 2008.
 
14
M. Pelikan, D. Goldberg, and F. Lobo. A survey of optimizations by building and using probabilistic models. Technical Report, Illigal Report 99018, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, USA, 1999.
 
15
 
16
 
17
 
18
 
19
 
20

Collaborative Colleagues:
Topon K. Paul: colleagues
Ken Ueno: colleagues
Koichiro Iwata: colleagues
Toshio Hayashi: colleagues
Nobuyoshi Honda: colleagues