Structured learning for spatial information extraction from biomedical text: Bacteria biotopes

Parisa Kordjamshidi, Dan Roth, Marie Francine Moens

Research output: Contribution to journalArticlepeer-review


Background: We aim to automatically extract species names of bacteria and their locations from webpages. This task is important for exploiting the vast amount of biological knowledge which is expressed in diverse natural language texts and putting this knowledge in databases for easy access by biologists. The task is challenging and the previous results are far below an acceptable level of performance, particularly for extraction of localization relationships. Therefore, we aim to design a new system for such extractions, using the framework of structured machine learning techniques. Results: We design a new model for joint extraction of biomedical entities and the localization relationship. Our model is based on a spatial role labeling (SpRL) model designed for spatial understanding of unrestricted text. We extend SpRL to extract discourse level spatial relations in the biomedical domain and apply it on the BioNLP-ST 2013, BB-shared task. We highlight the main differences between general spatial language understanding and spatial information extraction from the scientific text which is the focus of this work. We exploit the text's structure and discourse level global features. Our model and the designed features substantially improve on the previous systems, achieving an absolute improvement of approximately 57 percent over F1 measure of the best previous system for this task. Conclusions: Our experimental results indicate that a joint learning model over all entities and relationships in a document outperforms a model which extracts entities and relationships independently. Our global learning model significantly improves the state-of-the-art results on this task and has a high potential to be adopted in other natural language processing (NLP) tasks in the biomedical domain.

Original languageEnglish (US)
Article number129
JournalBMC bioinformatics
Issue number1
StatePublished - Apr 25 2015
Externally publishedYes


  • Bacteria biotopes
  • BioNLP
  • Biomedical text mining
  • Spatial information extraction
  • Structured learning

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics


Dive into the research topics of 'Structured learning for spatial information extraction from biomedical text: Bacteria biotopes'. Together they form a unique fingerprint.

Cite this