Abstract
This paper takes a new view of motif discovery, addressing a common problem in existing motif finders. A motif is treated as a feature of the input promoter regions that leads to a good classifier between these promoters and a set of background promoters. This perspective allows us to adapt existing methods of feature selection, a well studied topic in machine learning, to motif discovery. We develop a general algorithmic framework that can be specialized to work with a wide variety of motif models, including consensus models with degenerate symbols or mismatches, and composite motifs. A key feature of our algorithm is that it measures over-representation while maintaining information about the distribution of motif instances in individual promoters. The assessment of a motif's discriminative power is normalized against chance behaviour by a probabilistic analysis. We apply our framework to two popular motif models, and are able to detect several known binding sites in sets of co-regulated genes in yeast.
Original language | English (US) |
---|---|
Pages | 291-298 |
Number of pages | 8 |
DOIs | |
State | Published - 2002 |
Externally published | Yes |
Event | RECOMB 2002: Proceedings of the Sixth Annual International Conference on Computational Biology - Washington, DC, United States Duration: Apr 18 2002 → Apr 21 2002 |
Other
Other | RECOMB 2002: Proceedings of the Sixth Annual International Conference on Computational Biology |
---|---|
Country/Territory | United States |
City | Washington, DC |
Period | 4/18/02 → 4/21/02 |
ASJC Scopus subject areas
- Computer Science(all)
- Biochemistry, Genetics and Molecular Biology(all)