Two-Stage Graph-Augmented Summarization of Scientific Documents

Rezvaneh Rezapour, Yubin Ge, Kanyao Han, Ray Jeong, Jana Diesner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic text summarization helps to digest the vast and ever-growing amount of scientific publications. While transformer-based solutions like BERT and SciBERT have advanced scientific summarization, lengthy documents pose a challenge due to the token limits of these models. To address this issue, we introduce and evaluate a two-stage model that combines an extract-then-compress framework. Our model incorporates a “graph-augmented extraction module” to select order-based salient sentences and an “abstractive compression module” to generate concise summaries. Additionally, we introduce the BioConSumm dataset, which focuses on biodiversity conservation, to support underrepresented domains and explore domain-specific summarization strategies. Out of the tested models, our model achieves the highest ROUGE-2 and ROUGE-L scores on our newly created dataset (BioConSumm) and on the SUMPUBMED dataset, which serves as a benchmark in the field of biomedicine.

Original languageEnglish (US)
Title of host publicationNLP4Science 2024 - 1st Workshop on NLP for Science, Proceedings of the Workshop
EditorsLotem Peled-Cohen, Nitay Calderon, Shir Lissak, Roi Reichart
PublisherAssociation for Computational Linguistics (ACL)
Pages36-46
Number of pages11
ISBN (Electronic)9798891761858
DOIs
StatePublished - 2024
Event1st Workshop on NLP for Science, NLP4Science 2024 - Miami, United States
Duration: Nov 16 2024 → …

Publication series

NameNLP4Science 2024 - 1st Workshop on NLP for Science, Proceedings of the Workshop

Conference

Conference1st Workshop on NLP for Science, NLP4Science 2024
Country/TerritoryUnited States
CityMiami
Period11/16/24 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'Two-Stage Graph-Augmented Summarization of Scientific Documents'. Together they form a unique fingerprint.

Cite this