Título Feature grouping and selection on high-dimensional microarray data
Autores Garcia-Torres M. , Gomez-Vela F. , BECERRA ALONSO, DAVID, Melian-Batista B. , Moreno-Vega J.M.
Publicación externa No
Medio Proc. - Int. Workshop Data Min. Appl., DMIA: Part ETyC
Alcance Conference Paper
Naturaleza Científica
Web https://www.scopus.com/inward/record.uri?eid=2-s2.0-84988008158&doi=10.1109%2fDMIA.2015.18&partnerID=40&md5=998cef5b23a01a68867f15af5d48a9f7
Fecha de publicacion 01/01/2016
Scopus Id 2-s2.0-84988008158
DOI 10.1109/DMIA.2015.18
Abstract In classification tasks, as the dimensionality increases, the performance of the classifier improves until an optimal number of features is reached. Further increases of the dimensionality without increasing the number of training samples results in a degradation in classifier performance. This fact, called the curse of dimensionality, has become more relevant with the advent of larger datasets and the demands of Knowledge Discovery from Big Data. In this context, feature grouping has become an effective approach to provide additional information about relationships between features. In this work, we propose a greedy strategy, called GreedyPGG, that groups features based on the concept of Markov blankets. To such aim, we introduce the idea of predominant group of features. We also present an adaptation of the Variable Neighborhood Search (VNS) to high-dimensional feature selection that uses the GreedyPGG to reduce the search space. We test the effectiveness of the GreedyPGG on synthetic datasets and the VNS on microarray datasets. We compare VNS with popular and competitive strategies. Results show that GreedyPGG groups correlated features in an efficient way and that VNS is a competitive strategy, capable of finding a small number of features with high predictive power. © 2015 IEEE.
Palabras clave Big data; Data mining; Optimization; Classifier performance; Curse of dimensionality; Effective approaches; Feature grouping; High dimensional feature; Meta heuristics; Microarray data sets; Variable
Miembros de la Universidad Loyola

Change your preferences Gestionar cookies