Title OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets
Authors Garcia-Pedrajas, Nicolas , PÉREZ RODRÍGUEZ, JAVIER, de Haro-Garcia, Aida
External publication Si
Means IEEE T. Cybern.
Scope Article
Nature Científica
JCR Quartile 4
Publication date 01/02/2013
ISI 000317643500027
DOI 10.1109/TSMCB.2012.2206381
Abstract In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method\'s ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features.
Keywords Class-imbalance problem; instance selection; instance-based learning; very large problems
Universidad Loyola members

Change your preferences Manage cookies