Title Floating search methodology for combining classification models for site recognition in DNA sequences
Authors PÉREZ RODRÍGUEZ, JAVIER, de Haro García, Aida , García Pedrajas, Nicolás
External publication No
Means IEEE/ACM Trans. Comput. BioL. Bioinf.
Scope Article
Nature Científica
JCR Quartile 1
SJR Quartile 2
JCR Impact 3.71000
SJR Impact 0.74500
Area International
Web https://www.scopus.com/inward/record.uri?eid=2-s2.0-85121687757&doi=10.1109%2fTCBB.2020.2974221&partnerID=40&md5=a4ce9cdfe9c3859099a992fc52e6696b
Publication date 17/02/2020
ISI 000728193500040
Scopus Id 2-s2.0-85121687757
DOI 10.1109/TCBB.2020.2974221
Abstract Recognition of the functional sites of genes, such as translation initiation sites, donor and acceptor splice sites and stop codons, is a relevant part of many current problems in bioinformatics. The best approaches use sophisticated classifiers, such as support vector machines. However, with the rapid accumulation of sequence data, methods for combining many sources of evidence are necessary as it is unlikely that a single classifier can solve this problem with the best possible performance. A major issue is that the number of possible models to combine is large and the use of all of these models is impractical. In this paper we present a methodology for combining many sources of information to recognize any functional site using "floating search", a powerful heuristics applicable when the cost of evaluating each solution is high. We present experiments on four functional sites in the human genome, which is used as the target genome, and use another 20 species as sources of evidence. The proposed methodology shows significant improvement over state-of-the-art methods. The results show an advantage of the proposed method and also challenge the standard assumption of using only genomes not very close and not very far from the human to improve the recognition of functional sites.
Keywords Biological cells; Bioinformatics; Genomics; Computational modeling; Support vector machines; Biological system modeling; Search problems; Site recognition; gene prediction; models combination
Universidad Loyola members

Change your preferences Manage cookies