Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles

Research output: Contribution to journalArticlepeer-review


Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.

Original languageEnglish (US)
Pages (from-to)173-189
Number of pages17
JournalJournal of Biomedical Informatics
Issue number2
StatePublished - Apr 2010


  • Biomedical informatics
  • Corpus annotation
  • Information extraction
  • Natural language processing
  • Relationship extraction
  • Scientific discovery
  • Text mining

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles'. Together they form a unique fingerprint.

Cite this