TY - GEN
T1 - Speech Retrieval in Unknown Languages
T2 - 3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, CLIAWS3 2009
AU - Zhuang, Xiaodan
AU - Huang, Jui Ting
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
© 2009 Association for Computational Linguistics.
PY - 2009
Y1 - 2009
N2 - Most cross-lingual speech retrieval assumes intensive knowledge about all involved languages. However, such resource may not exist for some less popular languages. Some applications call for speech retrieval in unknown languages. In this work, we leverage on a quasi-language-independent subword recognizer trained on multiple languages, to obtain an abstracted representation of speech data in an unknown language. Language-independent query expansion is achieved either by allowing a wide lattice output for an audio query, or by taking advantage of distinctive features in speech articulation to propose subwords most similar to the given subwords in a query. We propose using a retrieval model based on finite state machines for fuzzy matching of speech sound patterns, and further for speech retrieval. A pilot study of speech retrieval in unknown languages is presented, using English, Spanish and Russian as training languages, and Croatian as the unknown target language.
AB - Most cross-lingual speech retrieval assumes intensive knowledge about all involved languages. However, such resource may not exist for some less popular languages. Some applications call for speech retrieval in unknown languages. In this work, we leverage on a quasi-language-independent subword recognizer trained on multiple languages, to obtain an abstracted representation of speech data in an unknown language. Language-independent query expansion is achieved either by allowing a wide lattice output for an audio query, or by taking advantage of distinctive features in speech articulation to propose subwords most similar to the given subwords in a query. We propose using a retrieval model based on finite state machines for fuzzy matching of speech sound patterns, and further for speech retrieval. A pilot study of speech retrieval in unknown languages is presented, using English, Spanish and Russian as training languages, and Croatian as the unknown target language.
UR - http://www.scopus.com/inward/record.url?scp=85060674341&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060674341&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85060674341
T3 - NAACL HLT 2009 - 3rd International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, CLIAWS3 2009 - Proceedings of the Workshop
SP - 3
EP - 11
BT - NAACL HLT 2009 - 3rd International Workshop on Cross Lingual Information Access
A2 - Bandyopadhyay, Sivaji
A2 - Bhattacharyya, Pushpak
A2 - Varma, Vasudeva
A2 - Sarkar, Sudeshna
A2 - Kumaran, A
A2 - Udupa, Raghavendra
PB - Association for Computational Linguistics (ACL)
Y2 - 4 June 2009
ER -