Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition

Liming Wang, Siyuan Feng, Mark Hasegawa-Johnson, Chang D. Yoo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Phonemes are defined by their relationship to words: changing a phoneme changes the word. Learning a phoneme inventory with little supervision has been a longstanding challenge with important applications to under-resourced speech technology. In this paper, we bridge the gap between the linguistic and statistical definition of phonemes and propose a novel neural discrete representation learning model for self-supervised learning of phoneme inventory with raw speech and word labels. Given the availability of phoneme segmentation and some mild conditions, we prove that the phoneme inventory learned by our approach converges to the true one with an exponentially low error rate. Moreover, in experiments on TIMIT and Mboshi benchmarks, our approach consistently learns a better phoneme-level representation and achieves a lower error rate in a zero-resource phoneme recognition task than previous state-of-the-art self-supervised representation learning algorithms.

Original languageEnglish (US)
Title of host publicationACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
PublisherAssociation for Computational Linguistics (ACL)
Pages8027-8047
Number of pages21
ISBN (Electronic)9781955917216
StatePublished - 2022
Event60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland
Duration: May 22 2022May 27 2022

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/TerritoryIreland
CityDublin
Period5/22/225/27/22

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition'. Together they form a unique fingerprint.

Cite this