TY - JOUR
T1 - Speech Technology for Unwritten Languages
AU - Scharenborg, Odette
AU - Ondel, Lucas
AU - Palaskar, Shruti
AU - Arthur, Philip
AU - Ciannella, Francesco
AU - Du, Mingxing
AU - Larsen, Elin
AU - Merkx, Danny
AU - Riad, Rachid
AU - Wang, Liming
AU - Dupoux, Emmanuel
AU - Besacier, Laurent
AU - Black, Alan
AU - Hasegawa-Johnson, Mark
AU - Metze, Florian
AU - Neubig, Graham
AU - Stüker, Sebastian
AU - Godard, Pierre
AU - Müller, Markus
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2020
Y1 - 2020
N2 - Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.
AB - Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.
KW - Speech processing
KW - automatic speech recognition
KW - image retrieval
KW - speech synthesis
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85079642575&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079642575&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.2973896
DO - 10.1109/TASLP.2020.2973896
M3 - Article
AN - SCOPUS:85079642575
SN - 2329-9290
VL - 28
SP - 964
EP - 975
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
M1 - 8998182
ER -