Biomedical text mining for research rigor and integrity: Tasks, challenges, directions

Research output: Contribution to journalArticlepeer-review


An estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted because of problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the manifestation of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote responsible research practices, providing significant benefits for the biomedical research enterprise.

Original languageEnglish (US)
Pages (from-to)1400-1414
Number of pages15
JournalBriefings in bioinformatics
Issue number6
StatePublished - May 30 2017
Externally publishedYes


  • Biomedical research waste
  • Biomedical text mining
  • Natural language processing
  • Reproducibility
  • Research integrity
  • Research rigor

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology


Dive into the research topics of 'Biomedical text mining for research rigor and integrity: Tasks, challenges, directions'. Together they form a unique fingerprint.

Cite this