Abstract
Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N = 3) combined with positional unigram statistics gathered from a small batch of samples. Evaluated on the TIMIT benchmark, our model showcases competitive performance in ASR and phoneme segmentation tasks. Access our publicly available code at https://github.com/lwang114/GraphUnsupASR.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 10936-10940 |
| Number of pages | 5 |
| Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| DOIs | |
| State | Published - 2024 |
| Event | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of Duration: Apr 14 2024 → Apr 19 2024 |
Keywords
- acoustic unit discovery
- self-supervised speech processing
- speech recognition
- unsupervised phoneme segmentation
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'UNSUPERVISED SPEECH RECOGNITION WITH N-SKIPGRAM AND POSITIONAL UNIGRAM MATCHING'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS