Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages

Research output: Contribution to journalArticlepeer-review


Phones, the segmental units in the International Phonetic Alphabet (IPA), include isolated consonants or vowels; tones, the suprasegemental units, represent pitch and voice quality movements that may span many phones. The timings of tones and phones are loosely connected, e.g., tones may be synchronized with their associated vowels, syllable finals, or a sequence of two or three syllables depending on the language. Many past studies have investigated cross-lingual adaptation in an automatic speech recognition (ASR) tone-marked phone model, yet very few studied the interaction between cross-lingual adaptation and tone-phone synchronization. In this study, we perform an extensive study by multilingual training on four tonal languages and cross-lingual testing on the fifth, in a five-fold cross-validation framework, using four CTC-based systems that impose different degrees of synchronization between tones and phones. We discover that multilingual and cross-lingual training benefit from different training architectures. In multilingual training, when a large corpus of test-language training data is part of the training corpus, a system that requires synchronization of tones with phones produces significantly lower tone error rates than any of the systems that score tones and phones asynchronously. In cross-lingual training, however, when only limited adaptation data are available in the test language, jointly training synchronous tone-marked phones together with asynchronous phones and tones, as three separate system outputs jointly optimized using a multi-task learning framework, consistently and significantly outperforms the system that requires tone-phone synchrony.

Original languageEnglish (US)
Pages (from-to)1918-1926
Number of pages9
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
StatePublished - 2022


  • Autosegmental phonology
  • CTC
  • IPA
  • cross-lingual adap-tation
  • tones
  • under-resourced ASR

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages'. Together they form a unique fingerprint.

Cite this