cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models

Sidney Wong, Matthew Durward, Benjamin Adams, Jonathan Dunn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes our multiclass classification system developed as part of the LT-EDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based cross-language pretrained language model, XLM-RoBERTa, with spatially and temporally relevant social media language data. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. The results from the current study suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.
Original languageEnglish (US)
Title of host publicationProceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
EditorsBharathi R. Chakravarthi, B. Bharathi, Joephine Griffith, Kalika Bali, Paul Buitelaar
PublisherINCOMA Ltd., Shoumen, Bulgaria
Pages103-108
Number of pages6
ISBN (Print)9789544520847
StatePublished - Sep 1 2023
Externally publishedYes

Fingerprint

Dive into the research topics of 'cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models'. Together they form a unique fingerprint.

Cite this