TY - JOUR
T1 - An evaluation of information content as a metric for the inference of putative conserved noncoding regions in DNA sequences using a genetic algorithms approach
AU - Congdon, Clare Bates
AU - Aman, Joseph C.
AU - Nava, Gerardo M.
AU - Gaskins, H. Rex
AU - Mattingly, Carolyn J.
N1 - Funding Information:
The authors would like to thank Charles Fizer and Noah Smith for their early contributions to this project, Joel Graber for sharing insights and references about the problem and related work, Eric Green for sharing data, and D. Lunn A. Sawyer, John Kuehne, and Steve Bryant for technical support. This project was supported by P20 RR-016463 from the National Center for Research Resources and ES03828-19 from the National Institute of Environmental Health Sciences, components of the National Institutes of Health, as well as the Salisbury Cove Research Fund from the Mount Desert Island Biological Laboratory.
PY - 2008/1
Y1 - 2008/1
N2 - In previous work, we presented GAMI [1], an approach to motif inference that uses a genetic algorithms search. GAMI is designed specifically to find putative conserved regulatory motifs in noncoding regions of divergent species and is designed to allow for analysis of long nucleotide sequences. In this work, we compare GAMI's performance when run with its original fitness function (a simple count of the number of matches) and when run with information content (IC), as well as several variations on these metrics. Results indicate that IC does not identify highly conserved regions and, thus, is not the appropriate metric for this task, whereas variations on IC, as well as the original metric, succeed in identifying putative conserved regions.
AB - In previous work, we presented GAMI [1], an approach to motif inference that uses a genetic algorithms search. GAMI is designed specifically to find putative conserved regulatory motifs in noncoding regions of divergent species and is designed to allow for analysis of long nucleotide sequences. In this work, we compare GAMI's performance when run with its original fitness function (a simple count of the number of matches) and when run with information content (IC), as well as several variations on these metrics. Results indicate that IC does not identify highly conserved regions and, thus, is not the appropriate metric for this task, whereas variations on IC, as well as the original metric, succeed in identifying putative conserved regions.
KW - Biology
KW - Evolutionary computing
KW - Genetic algorithms
KW - Genetics
UR - http://www.scopus.com/inward/record.url?scp=38949206269&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38949206269&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2007.1059
DO - 10.1109/TCBB.2007.1059
M3 - Article
C2 - 18245871
AN - SCOPUS:38949206269
SN - 1545-5963
VL - 5
SP - 1
EP - 14
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 1
ER -