| Risk prediction and risk factors identification from imbalanced data with RPMBGA+ |
| Full text |
Pdf
(173 KB)
|
Source
|
Genetic And Evolutionary Computation Conference
archive
Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation
table of contents
Atlanta, GA, USA
SESSION: Late-breaking papers
table of contents
Pages 2193-2198
Year of Publication: 2008
ISBN:978-1-60558-131-6
|
|
Authors
|
|
Topon K. Paul
|
Toshiba Corporation, Kanagawa, Japan
|
|
Ken Ueno
|
Toshiba Corporation, Kanagawa, Japan
|
|
Koichiro Iwata
|
Toshiba Corporation, Tokyo, Japan
|
|
Toshio Hayashi
|
Toshiba Corporation, Tokyo, Japan
|
|
Nobuyoshi Honda
|
Toshiba Corporation, Tokyo, Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 53, Citation Count: 0
|
|
|
ABSTRACT
In this paper, we propose a new method to predict the risk of an event very accurately from imbalanced data in which the number of instances of the majority class is very larger than that of the minority class and to identify the features that are relevant for the target risk factor. To solve the trade-off between the prediction rates of the majority and the minority classes, three input parameters are used, which supply the costs of misclassification of an instance from the majority and the minority classes or the sensitivity threshold of the minority class. To get relevant features and to utilize the prior information about the relationship of a feature with the target risk factor, a probabilistic model building genetic algorithm called RPMBGA+ is employed. By applying the proposed technique to the health checkup and lifestyle data of Toshiba Corporation, we have found that the proposed method improves the sensitivity of the minority class and selects a very small number of informative features.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
B. Dasarathy. Nearest Neighbor(NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, 1991.
|
 |
4
|
|
| |
5
|
L. Eshelman. The CHC adaptive search algorithm. In Foundations of Genetic Algorithms I, pages 265--283. Morgan Kauffman, San Mateo CA, 1991.
|
| |
6
|
|
| |
7
|
|
| |
8
|
M. Kubat and S. Matwin. Addressing the curse of imbalanced data sets: One-sided sampling. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 179--186, 1997.
|
| |
9
|
C. Ling and C. Li. Data mining for direct marketing: Problems and solutions. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 73--79, 1998.
|
| |
10
|
T. K. Paul and H. Iba. Identification of informative genes for molecular classification using probabilistic model building genetic algorithm. In Proceedings of Genetic and Evolutionary Computation Conference 2004, pages 414--425. 2004.
|
 |
11
|
|
| |
12
|
T. K. Paul and H. Iba. Gene selection for classification of cancers using probabilistic model building genetic algorithm. BioSystems, 82(3):208--225, 2005.
|
| |
13
|
T. K. Paul and H. Iba. Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 23 August 2007. Preprint on IEEE Computer Society Digital Library. IEEE Computer Society, 11 April 2008.
|
| |
14
|
M. Pelikan, D. Goldberg, and F. Lobo. A survey of optimizations by building and using probabilistic models. Technical Report, Illigal Report 99018, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, USA, 1999.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
|