SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation

Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang D. Yoo

Research output: Contribution to conferencePaperpeer-review


Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase the frequency of these least frequent senses (LFS) to reduce the distributional bias of senses during training. We propose Sense-Maintained Sentence Mixup (SMSMix), a novel word-level mixup method that maintains the sense of a target word. SMSMix smoothly blends two sentences using mask prediction while preserving the relevant span determined by saliency scores to maintain a specific word's sense. To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word. With extensive experiments, we validate that our augmentation method can effectively give more information about rare senses during training with maintained target sense label.

Original languageEnglish (US)
Number of pages10
StatePublished - 2022
Event2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: Dec 7 2022Dec 11 2022


Conference2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems


Dive into the research topics of 'SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation'. Together they form a unique fingerprint.

Cite this