TY - JOUR
T1 - CRISPR-COPIES: an in silico platform for discovery of neutral integration sites for CRISPR/Cas-facilitated gene integration
AU - Boob, Aashutosh Girish
AU - Zhu, Zhixin
AU - Intasian, Pattarawan
AU - Jain, Manan
AU - Petrov, Vassily Andrew
AU - Lane, Stephan Thomas
AU - Tan, Shih-I
AU - Xun, Guanhua
AU - Zhao, Huimin
N1 - DOE Center for Advanced Bioenergy and Bioproducts Innovation (U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under Award Number DE-SC0018420). Funding for open access charge: U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research.
PY - 2024/4/12
Y1 - 2024/4/12
N2 - The CRISPR/Cas system has emerged as a powerful tool for genome editing in metabolic engineering and human gene therapy. However, locating the optimal site on the chromosome to integrate heterologous genes using the CRISPR/Cas system remains an open question. Selecting a suitable site for gene integration involves considering multiple complex criteria, including factors related to CRISPR/Cas-mediated integration, genetic stability, and gene expression. Consequently, identifying such sites on specific or different chromosomal locations typically requires extensive characterization efforts. To address these challenges, we have developed CRISPR-COPIES, a COmputational Pipeline for the Identification of CRISPR/Cas-facilitated intEgration Sites. This tool leverages ScaNN, a state-of-the-art model on the embedding-based nearest neighbor search for fast and accurate off-target search, and can identify genome-wide intergenic sites for most bacterial and fungal genomes within minutes. As a proof of concept, we utilized CRISPR-COPIES to characterize neutral integration sites in three diverse species: Saccharomyces cerevisiae, Cupriavidus necator, and HEK293T cells. In addition, we developed a user-friendly web interface for CRISPR-COPIES (https://biofoundry.web.illinois.edu/copies/). We anticipate that CRISPR-COPIES will serve as a valuable tool for targeted DNA integration and aid in the characterization of synthetic biology toolkits, enable rapid strain construction to produce valuable biochemicals, and support human gene and cell therapy applications.
AB - The CRISPR/Cas system has emerged as a powerful tool for genome editing in metabolic engineering and human gene therapy. However, locating the optimal site on the chromosome to integrate heterologous genes using the CRISPR/Cas system remains an open question. Selecting a suitable site for gene integration involves considering multiple complex criteria, including factors related to CRISPR/Cas-mediated integration, genetic stability, and gene expression. Consequently, identifying such sites on specific or different chromosomal locations typically requires extensive characterization efforts. To address these challenges, we have developed CRISPR-COPIES, a COmputational Pipeline for the Identification of CRISPR/Cas-facilitated intEgration Sites. This tool leverages ScaNN, a state-of-the-art model on the embedding-based nearest neighbor search for fast and accurate off-target search, and can identify genome-wide intergenic sites for most bacterial and fungal genomes within minutes. As a proof of concept, we utilized CRISPR-COPIES to characterize neutral integration sites in three diverse species: Saccharomyces cerevisiae, Cupriavidus necator, and HEK293T cells. In addition, we developed a user-friendly web interface for CRISPR-COPIES (https://biofoundry.web.illinois.edu/copies/). We anticipate that CRISPR-COPIES will serve as a valuable tool for targeted DNA integration and aid in the characterization of synthetic biology toolkits, enable rapid strain construction to produce valuable biochemicals, and support human gene and cell therapy applications.
UR - http://www.scopus.com/inward/record.url?scp=85190510704&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190510704&partnerID=8YFLogxK
U2 - 10.1093/nar/gkae062
DO - 10.1093/nar/gkae062
M3 - Article
C2 - 38346683
SN - 0305-1048
VL - 52
JO - Nucleic acids research
JF - Nucleic acids research
IS - 6
M1 - gkae062
ER -