TY - GEN
T1 - ADAMER-CTC
T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
AU - Eom, Soo Hwan
AU - Yoon, Eunseop
AU - Yoon, Hee Suk
AU - Kim, Chanwoo
AU - Hasegawa-Johnson, Mark
AU - Yoo, Chang D.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.
AB - In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.
KW - Automatic Speech Recognition
KW - Connectionist Temporal Classification
KW - Entropy Maximization
UR - http://www.scopus.com/inward/record.url?scp=85192974293&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192974293&partnerID=8YFLogxK
U2 - 10.1109/ICASSP48485.2024.10446721
DO - 10.1109/ICASSP48485.2024.10446721
M3 - Conference contribution
AN - SCOPUS:85192974293
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 12707
EP - 12711
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 April 2024 through 19 April 2024
ER -