TY - GEN
T1 - Lexicalized phonotactic word segmentation
AU - Fleck, Margaret M.
PY - 2008
Y1 - 2008
N2 - This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic, and promising results on accurate phonetic transcriptions with extensive pronunciation variation. Expanding training data beyond the traditional miniature datasets pushes performance numbers well above those previously reported. This suggests that WordEnds is a viable model of child language acquisition and might be useful in speech understanding.
AB - This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic, and promising results on accurate phonetic transcriptions with extensive pronunciation variation. Expanding training data beyond the traditional miniature datasets pushes performance numbers well above those previously reported. This suggests that WordEnds is a viable model of child language acquisition and might be useful in speech understanding.
UR - http://www.scopus.com/inward/record.url?scp=84859887524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859887524&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84859887524
SN - 9781932432046
T3 - ACL-08: HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
SP - 130
EP - 138
BT - ACL-08
T2 - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-08: HLT
Y2 - 15 June 2008 through 20 June 2008
ER -