Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we focus on patterns of colexification (co-expressions of form-meaning mapping in the lexicon) as an aspect of lexical-semantic organization, and use them to build large scale synset graphs across BabelNet's typologically diverse set of 499 world languages. We introduce and compare several approaches: monolingual and cross-lingual colexification graphs, popular distributional models, and fusion approaches. The models are evaluated against human judgments on a semantic similarity task for nine languages. Our strong empirical findings also point to the importance of universality of our graph synset embedding representations with no need for any language-specific adaptation when evaluated on the lexical similarity task. The insights of our exploratory investigation of large-scale colexification graphs could inspire significant advances in NLP across languages, especially for tasks involving languages which lack dedicated lexical resources, and can benefit from language transfer from large shared cross-lingual semantic spaces.

Original languageEnglish (US)
Title of host publicationNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages5259-5270
Number of pages12
ISBN (Electronic)9781955917711
StatePublished - 2022
Event2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States
Duration: Jul 10 2022Jul 15 2022

Publication series

NameNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period7/10/227/15/22

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity'. Together they form a unique fingerprint.

Cite this