ADAMER-CTC: CONNECTIONIST TEMPORAL CLASSIFICATION WITH ADAPTIVE MAXIMUM ENTROPY REGULARIZATION FOR AUTOMATIC SPEECH RECOGNITION

Soo Hwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.

Original languageEnglish (US)
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages12707-12711
Number of pages5
ISBN (Electronic)9798350344851
DOIs
StatePublished - 2024
Event49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: Apr 14 2024Apr 19 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period4/14/244/19/24

Keywords

  • Automatic Speech Recognition
  • Connectionist Temporal Classification
  • Entropy Maximization

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'ADAMER-CTC: CONNECTIONIST TEMPORAL CLASSIFICATION WITH ADAPTIVE MAXIMUM ENTROPY REGULARIZATION FOR AUTOMATIC SPEECH RECOGNITION'. Together they form a unique fingerprint.

Cite this