TY - JOUR
T1 - Transforming the study of organisms: Phenomic data models and knowledge bases
AU - Thessen, Anne E.
AU - Walls, Ramona L.
AU - Vogt, Lars
AU - Singer, Jessica
AU - Warren, Robert
AU - Buttigieg, Pier Luigi
AU - Balhoff, James P.
AU - Mungall, Christopher J.
AU - McGuinness, Deborah L.
AU - Stucky, Brian J.
AU - Yoder, Matthew J.
AU - Haendel, Melissa A.
N1 - Funding Information:
Lars Vogt has been funded by Leibniz Competition #SAW-2016-SGN-2, Chris Mungall, Melissa Haendel, and Anne Thessen have been funded by NIH #5R24OD011883 (https://www.nih. gov/). Ramona Walls was funded by NSF ABI 1759808 (https://nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PY - 2020/11/24
Y1 - 2020/11/24
N2 - The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem. Author summary Organism traits determine the role of species in economies and ecosystems, and the expression of those traits relies on interactions between an organism's genes and environment. The key to predicting trait expression is having a large pool of data to derive models, but most organism trait observations are recorded in ways that are not computational. In this paper, intended for an interdisciplinary audience, we discuss data models for representing organism traits in a computable format. Increasing acceptance of a data model for traits will greatly increase the pool of available data for studying the dynamic processes that determine trait expression. We hope that explaining these data models in a straightforward way and articulating their potential for accelerating discovery will increase adoption of this promising data standard.
AB - The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem. Author summary Organism traits determine the role of species in economies and ecosystems, and the expression of those traits relies on interactions between an organism's genes and environment. The key to predicting trait expression is having a large pool of data to derive models, but most organism trait observations are recorded in ways that are not computational. In this paper, intended for an interdisciplinary audience, we discuss data models for representing organism traits in a computable format. Increasing acceptance of a data model for traits will greatly increase the pool of available data for studying the dynamic processes that determine trait expression. We hope that explaining these data models in a straightforward way and articulating their potential for accelerating discovery will increase adoption of this promising data standard.
KW - INHS
UR - http://www.scopus.com/inward/record.url?scp=85096818055&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096818055&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1008376
DO - 10.1371/journal.pcbi.1008376
M3 - Review article
C2 - 33232313
AN - SCOPUS:85096818055
SN - 1553-734X
VL - 16
JO - PLoS computational biology
JF - PLoS computational biology
IS - 11
M1 - e1008376
ER -