Discovering Content through Text Mining for a Synthetic Biology Knowledge System

Bridget T. McInnes, J. Stephen Downie, Yikai Hao, Jacob Jett, Kevin Keating, Gaurav Nakum, Sudhanshu Ranjan, Nicholas E. Rodriguez, Jiawei Tang, Du Xiang, Eric M. Young, Mai H. Nguyen

Research output: Contribution to journalArticlepeer-review

Abstract

Scientific articles contain a wealth of information about experimental methods and results describing biological designs. Due to its unstructured nature and multiple sources of ambiguity and variability, extracting this information from text is a difficult task. In this paper, we describe the development of the synthetic biology knowledge system (SBKS) text processing pipeline. The pipeline uses natural language processing techniques to extract and correlate information from the literature for synthetic biology researchers. Specifically, we apply named entity recognition, relation extraction, concept grounding, and topic modeling to extract information from published literature to link articles to elements within our knowledge system. Our results show the efficacy of each of the components on synthetic biology literature and provide future directions for further advancement of the pipeline.

Original languageEnglish (US)
Pages (from-to)2043-2054
Number of pages12
JournalACS synthetic biology
Volume11
Issue number6
DOIs
StatePublished - Jun 17 2022

Keywords

  • concept grounding
  • named entity recognition
  • natural language processing
  • relation extraction
  • synthetic biology text processing pipeline
  • topic modeling

ASJC Scopus subject areas

  • Biomedical Engineering
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)

Cite this