Computational Thematic Analysis of Poetry via Bimodal Large Language Models

Research output: Contribution to journalArticlepeer-review

Abstract

This article proposes a multilabel poem topic classification algorithm utilizing large language models and auxiliary data to address the lack of diverse metadata in digital poetry libraries. The study examines the potential of context-dependent language models, specifically bidirectional encoder representations from transformers (BERT), for understanding poetic words and utilizing auxiliary data, such as author's notes, in supplementing poetry text. The experimental results demonstrate that the BERT-based model outperforms the traditional support vector machine-based model across all input types and datasets. We also show that incorporating notes as an additional input improves the performance of the poem-only model. Overall, the study suggests pretrained context-dependent language models and auxiliary data have potential to enhance the accessibility of various poems within collections. This research can eventually assist in promoting the discovery of underrepresented poems in digital libraries, even if they lack associated metadata, thus enhancing the understanding and appreciation of the literary form.

Original languageEnglish (US)
Pages (from-to)538-542
Number of pages5
JournalProceedings of the Association for Information Science and Technology
Volume60
Issue number1
DOIs
StatePublished - Oct 2023
Externally publishedYes

Keywords

  • auxiliary data
  • computational poetry analysis
  • context-dependent language model
  • Digital libraries
  • multilabel classification

ASJC Scopus subject areas

  • General Computer Science
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Computational Thematic Analysis of Poetry via Bimodal Large Language Models'. Together they form a unique fingerprint.

Cite this