Título OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets
Autores Garcia-Pedrajas, Nicolas , PÉREZ RODRÍGUEZ, JAVIER, de Haro-Garcia, Aida
Publicación externa Si
Medio IEEE T. Cybern.
Alcance Article
Naturaleza Científica
Cuartil JCR 4
Fecha de publicacion 01/02/2013
ISI 000317643500027
DOI 10.1109/TSMCB.2012.2206381
Abstract In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method\'s ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features.
Palabras clave Class-imbalance problem; instance selection; instance-based learning; very large problems
Miembros de la Universidad Loyola

Change your preferences Gestionar cookies