TY - JOUR
T1 - A synergistic DNA logic predicts genome-wide chromatin accessibility
AU - Hashimoto, Tatsunori
AU - Sherwood, Richard I.
AU - Kang, Daniel D.
AU - Rajagopal, Nisha
AU - Barkal, Amira A.
AU - Zeng, Haoyang
AU - Emons, Bart J.M.
AU - Srinivasan, Sharanya
AU - Jaakkola, Tommi
AU - Gifford, David K.
N1 - Publisher Copyright:
© 2016 Aldrup-MacDonald et al.
PY - 2016/10
Y1 - 2016/10
N2 - Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution.We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.
AB - Enhancers and promoters commonly occur in accessible chromatin characterized by depleted nucleosome contact; however, it is unclear how chromatin accessibility is governed. We show that log-additive cis-acting DNA sequence features can predict chromatin accessibility at high spatial resolution.We develop a new type of high-dimensional machine learning model, the Synergistic Chromatin Model (SCM), which when trained with DNase-seq data for a cell type is capable of predicting expected read counts of genome-wide chromatin accessibility at every base from DNA sequence alone, with the highest accuracy at hypersensitive sites shared across cell types. We confirm that a SCM accurately predicts chromatin accessibility for thousands of synthetic DNA sequences using a novel CRISPR-based method of highly efficient site-specific DNA library integration. SCMs are directly interpretable and reveal that a logic based on local, nonspecific synergistic effects, largely among pioneer TFs, is sufficient to predict a large fraction of cellular chromatin accessibility in a wide variety of cell types.
UR - http://www.scopus.com/inward/record.url?scp=84989849276&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84989849276&partnerID=8YFLogxK
U2 - 10.1101/gr.199778.115
DO - 10.1101/gr.199778.115
M3 - Article
C2 - 27456004
AN - SCOPUS:84989849276
SN - 1088-9051
VL - 26
SP - 1430
EP - 1440
JO - Genome Research
JF - Genome Research
IS - 10
ER -