Título A scalable approach to simultaneous evolutionary instance and feature selection
Autores Garcia-Pedrajas, Nicolas , de Haro-Garcia, Aida , PÉREZ RODRÍGUEZ, JAVIER
Publicación externa Si
Medio Inf. Sci.
Alcance Article
Naturaleza Científica
Cuartil JCR 1
Cuartil SJR 1
Impacto JCR 3.89300
Impacto SJR 2.17100
Fecha de publicacion 10/04/2013
ISI 000315245800010
DOI 10.1016/j.ins.2012.10.006
Abstract An enormous amount of information is continually being produced in current research, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection and text mining, involve large or enormous datasets. These datasets pose serious problems for many data mining algorithms.\n One method to address very large datasets is data reduction. Among the most useful data reduction methods is simultaneous instance and feature selection. This method achieves a considerable reduction in the training data while maintaining, or even improving, the performance of the data-mining algorithm. However, it suffers from a high degree of scalability problems, even for medium-sized datasets. In this paper, we propose a new evolutionary simultaneous instance and feature selection algorithm that is scalable to millions of instances and thousands of features.\n This proposal is based on the divide-and-conquer principle combined with bookkeeping. The divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the entire dataset into memory.\n Using 50 medium-sized datasets, we will demonstrate our method\'s ability to match the results of state-of-the-art instance and feature selection methods while significantly reducing the time requirements. Using 13 very large datasets, we will demonstrate the scalability of our proposal to millions of instances and thousands of features. (C) 2012 Elsevier Inc. All rights reserved.
Palabras clave Simultaneous instance and feature selection; Instance selection; Feature selection; Instance-based learning; Very large problems
Miembros de la Universidad Loyola

Change your preferences Gestionar cookies