Title A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection
Authors Garcia-Pedrajas, Nicolas , de Haro-Garcia, Aida , PÉREZ RODRÍGUEZ, JAVIER
External publication Si
Means Evol. Comput.
Scope Article
Nature Científica
JCR Quartile 1
SJR Quartile 2
JCR Impact 2.36600
SJR Impact 1.03200
Publication date 01/01/2014
ISI 000331557100001
DOI 10.1162/EVCO_a_00102
Abstract Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.
Keywords Memetic algorithms; instance selection; feature selection; scaling-up
Universidad Loyola members

Change your preferences Manage cookies