Multi-view Graph-Based Text Representations for Imbalanced Classification

Ola Karajeh, Ismini Lourentzou, Edward A. Fox

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Text classification is a fundamental task in natural language processing, notably in the context of digital libraries, where it is essential for organizing and retrieving large numbers of documents in diverse collections, especially when tackling issues with inherent class imbalance. Sequence-based models can successfully capture semantics in local consecutive text sequences. On the other hand, graph-based models can preserve global co-occurrences that capture non-consecutive and long-distance semantics. A text representation approach that combines local and global information can enhance performance in practical class imbalance text classification scenarios. Yet, multi-view graph-based text representations have received limited attention. In this work, we introduce Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN), a transductive multi-view text classification model that captures textual graph representations for the minority class, along with sequence-based text representations. Experiments show that MMCT-GCN variants outperform baseline models on multiple text collections.

Original languageEnglish (US)
Title of host publicationLinking Theory and Practice of Digital Libraries - 27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, Proceedings
EditorsOmar Alonso, Helena Cousijn, Gianmaria Silvello, Stefano Marchesin, Mónica Marrero, Carla Teixeira Lopes
PublisherSpringer
Pages249-264
Number of pages16
ISBN (Print)9783031438486
DOIs
StatePublished - 2023
Externally publishedYes
Event27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023 - Zadar, Croatia
Duration: Sep 26 2023Sep 29 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14241 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023
Country/TerritoryCroatia
CityZadar
Period9/26/239/29/23

Keywords

  • Graph Convolutional Networks
  • Imbalanced Data
  • Text Classification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Multi-view Graph-Based Text Representations for Imbalanced Classification'. Together they form a unique fingerprint.

Cite this