ARMADA: Attribute-Based Multimodal Data Augmentation

Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan Hao Huang, Te Lin Wu, Nanyun Peng, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In Multimodal Language Models (MLMs), the cost of manually annotating high-quality image-text pair data for fine-tuning and alignment is extremely high. While existing multimodal data augmentation frameworks propose ways to augment image-text pairs, they either suffer from semantic inconsistency between texts and images, or generate unrealistic images, causing knowledge gap with real world examples. To address these issues, we propose Attribute-based Multimodal Data Augmentation (ARMADA), a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes of the mentioned entities. Specifically, we extract entities and their visual attributes from the original text data, then search for alternative values for the visual attributes under the guidance of knowledge bases (KBs) and large language models (LLMs). We then utilize an image-editing model to edit the images with the extracted attributes. ARMADA is a novel multimodal data generation framework that: (i) extracts knowledge-grounded attributes from symbolic KBs for semantically consistent yet distinctive image-text pair generation, (ii) generates visually similar images of disparate categories using neighboring entities in the KB hierarchy, and (iii) uses the commonsense knowledge of LLMs to modulate auxiliary visual attributes such as backgrounds for more robust representation of original entities. Our empirical results over four downstream tasks demonstrate the efficacy of our framework to produce high-quality data and enhance the model performance. This also highlights the need to leverage external knowledge proxies for enhanced interpretability and real-world grounding.

Original languageEnglish (US)
Title of host publicationWikiNLP 2024 - 1st Workshop on Advancing Natural Language Processing for Wikipedia, Proceedings of the Workshop
EditorsLucie Lucie-Aimee, Angela Fan, Tajuddeen Gwadabe, Isaac Johnson, Fabio Petroni, Daniel van Strien
PublisherAssociation for Computational Linguistics (ACL)
Pages112-125
Number of pages14
ISBN (Electronic)9798891761889
DOIs
StatePublished - 2024
Event1st Workshop on Advancing Natural Language Processing for Wikipedia, WikiNLP 2024 - Miami, United States
Duration: Nov 16 2024 → …

Publication series

NameWikiNLP 2024 - 1st Workshop on Advancing Natural Language Processing for Wikipedia, Proceedings of the Workshop

Conference

Conference1st Workshop on Advancing Natural Language Processing for Wikipedia, WikiNLP 2024
Country/TerritoryUnited States
CityMiami
Period11/16/24 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'ARMADA: Attribute-Based Multimodal Data Augmentation'. Together they form a unique fingerprint.

Cite this