Speech Retrieval in Unknown Languages: a Pilot Study

Xiaodan Zhuang, Jui Ting Huang, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most cross-lingual speech retrieval assumes intensive knowledge about all involved languages. However, such resource may not exist for some less popular languages. Some applications call for speech retrieval in unknown languages. In this work, we leverage on a quasi-language-independent subword recognizer trained on multiple languages, to obtain an abstracted representation of speech data in an unknown language. Language-independent query expansion is achieved either by allowing a wide lattice output for an audio query, or by taking advantage of distinctive features in speech articulation to propose subwords most similar to the given subwords in a query. We propose using a retrieval model based on finite state machines for fuzzy matching of speech sound patterns, and further for speech retrieval. A pilot study of speech retrieval in unknown languages is presented, using English, Spanish and Russian as training languages, and Croatian as the unknown target language.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2009 - 3rd International Workshop on Cross Lingual Information Access
Subtitle of host publicationAddressing the Information Need of Multilingual Societies, CLIAWS3 2009 - Proceedings of the Workshop
EditorsSivaji Bandyopadhyay, Pushpak Bhattacharyya, Vasudeva Varma, Sudeshna Sarkar, A Kumaran, Raghavendra Udupa
PublisherAssociation for Computational Linguistics (ACL)
Pages3-11
Number of pages9
ISBN (Electronic)9781932432336
StatePublished - 2009
Event3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, CLIAWS3 2009 - Boulder, United States
Duration: Jun 4 2009 → …

Publication series

NameNAACL HLT 2009 - 3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, CLIAWS3 2009 - Proceedings of the Workshop

Conference

Conference3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, CLIAWS3 2009
Country/TerritoryUnited States
CityBoulder
Period6/4/09 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Speech Retrieval in Unknown Languages: a Pilot Study'. Together they form a unique fingerprint.

Cite this