A Study of Methods for the Generation of Domain-Aware Word Embeddings

Dominic Seyler, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Word embeddings are essential components for many text data applications. In most work, "out-of-the-box" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create "domain-aware" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.

Original languageEnglish (US)
Title of host publicationSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1609-1612
Number of pages4
ISBN (Electronic)9781450380164
DOIs
StatePublished - Jul 25 2020
Event43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020 - Virtual, Online, China
Duration: Jul 25 2020Jul 30 2020

Publication series

NameSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
CountryChina
CityVirtual, Online
Period7/25/207/30/20

Keywords

  • domain adaptation
  • empirical study
  • text representation

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

Fingerprint Dive into the research topics of 'A Study of Methods for the Generation of Domain-Aware Word Embeddings'. Together they form a unique fingerprint.

Cite this