Abstract
Scientific articles contain a wealth of information about experimental methods and results describing biological designs. Due to its unstructured nature and multiple sources of ambiguity and variability, extracting this information from text is a difficult task. In this paper, we describe the development of the synthetic biology knowledge system (SBKS) text processing pipeline. The pipeline uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers. Specifically, we apply named entity recognition, relation extraction, concept grounding, and topic modeling to extract information from published literature to link articles to elements within our knowledge system. Our results show the efficacy of each of the components on synthetic biology literature and provide future directions for further advancement of the pipeline.
Original language | English (US) |
---|---|
Pages (from-to) | 2043-2054 |
Number of pages | 12 |
Journal | ACS synthetic biology |
Volume | 11 |
Issue number | 6 |
DOIs | |
State | Published - Jun 17 2022 |
Keywords
- concept grounding
- named entity recognition
- natural language processing
- relation extraction
- synthetic biology text processing pipeline
- topic modeling
ASJC Scopus subject areas
- Biomedical Engineering
- Biochemistry, Genetics and Molecular Biology (miscellaneous)
Fingerprint
Dive into the research topics of 'Discovering Content through Text Mining for a Synthetic Biology Knowledge System'. Together they form a unique fingerprint.Datasets
-
SBKS - Celllines Raw Entity Mentions
Jett, J. (Creator), University of Illinois Urbana-Champaign, Jul 25 2022
DOI: 10.13012/B2IDB-8851803_V1
Dataset
-
SBKS - Chemical Raw Entity Mentions
Jett, J. (Creator), University of Illinois Urbana-Champaign, Jul 25 2022
DOI: 10.13012/B2IDB-4163883_V1
Dataset
-
SBKS - Genes Raw Entity Mentions
Jett, J. (Creator), University of Illinois Urbana-Champaign, Jul 25 2022
DOI: 10.13012/B2IDB-3887275_V1
Dataset