ACM Home Page
Please provide us with feedback. Feedback
Experimental perspectives on learning from imbalanced data
Full text PdfPdf (358 KB)
Source ICML; Vol. 227 archive
Proceedings of the 24th international conference on Machine learning table of contents
Corvalis, Oregon
Pages: 935 - 942  
Year of Publication: 2007
ISBN:978-1-59593-793-3
Authors
Jason Van Hulse  Florida Atlantic University, Boca Raton, FL
Taghi M. Khoshgoftaar  Florida Atlantic University, Boca Raton, FL
Amri Napolitano  Florida Atlantic University, Boca Raton, FL
Sponsor
: Machine Learning Journal
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 98,   Citation Count: 5
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1273496.1273614
What is a DOI?

ABSTRACT

We present a comprehensive suite of experimentation on the subject of learning from imbalanced data. When classes are imbalanced, many learning algorithms can suffer from the perspective of reduced performance. Can data sampling be used to improve the performance of learners built from imbalanced data? Is the effectiveness of sampling related to the type of learner? Do the results change if the objective is to optimize different performance metrics? We address these and other issues in this work, showing that sampling in many cases will improve classifier performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Barandela, R., Valdovinos, R. M., Sanchez, J. S., & Ferri, F. J. (2004). The imbalanced training sample problem: Under or over sampling? In Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition (SSPR/SPR'04), Lecture Notes in Computer Science 3138, 806--814.
 
2
Berenson, M. L., Levine, D. M., & Goldstein, M. (1983). Intermediate statistical methods and applications: A computer package approach. Prentice-Hall, Inc.
 
3
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html. Department of Information and Computer Sciences, University of California, Irvine.
 
4
 
5
Chawla, N. V., Hall, L. O., Bowyer, K. W., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 321--357.
 
6
Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. Workshop on Learning from Imbalanced Data Sets II, International Conference on Machine Learning.
 
7
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderlinesmote: A new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing (ICIC'05). Lecture Notes in Computer Science 3644 (pp. 878--887). Springer-Verlag.
 
8
Hand, D. J. (2005). Good practice in retail credit scorecard assessment. Journal of the Operational Research Society, 56, 1109--1117.
 
9
Japkowicz, N. (2000). Learning from imbalanced data sets: a comparison of various strategies. AAAI Workshop on Learning from Imbalanced Data Sets (AAAI'00) (pp. 10--15).
10
 
11
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: One sided selection. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 179--186). Morgan Kaufmann.
 
12
Maloof, M. (2003). Learning when data sets are imbalanced and when costs are unequal and unknown. Proceedings of the ICML'03 Workshop on Learning from Imbalanced Data Sets.
 
13
Monard, M. C., & Batista, G. E. A. P. A. (2002). Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics (LAPTEC'02) (pp. 173--180).
 
14
 
15
SAS Institute (2004). SAS/STAT user's guide. SAS Institute Inc.
 
16
Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: the effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 315--354.
 
17
 
18

Collaborative Colleagues:
Jason Van Hulse: colleagues
Taghi M. Khoshgoftaar: colleagues
Amri Napolitano: colleagues