TY - JOUR
T1 - Measuring speech quality for text-to-speech systems
T2 - Development and assessment of a modified mean opinion score (MOS) scale
AU - Viswanathan, Mahesh
AU - Viswanathan, Madhubalan
PY - 2005/1
Y1 - 2005/1
N2 - The quality of text-to-speech systems can be effectively assessed only on the basis of reliable and valid listening tests to assess overall system performance. A mean opinion scale (MOS) has been the recommended measure of synthesized speech quality [ITU-T Recommendation P.85, 1994. Telephone transmission quality subjective opinion tests. A method for subjective performance assessment of the quality of speech voice output devices]. We assessed this MOS scale and developed and tested a modified measure of speech quality. This modified measure has new items specific to text-to-speech systems. Our research was motivated by the lack of clear evidence of the conceptual content of as well as the psychometric properties of the MOS scale. We present conceptual arguments and empirical evidence for the reliability and validity of a modified scale. Moreover, we employ state of the art psychometric techniques such as confirmatory factor analysis to provide strong tests of psychometric properties. This modified scale is better suited to appraise synthesis systems since it includes items that are specific to the artifacts found in synthesized speech. We believe that the speech synthesis research communities will find this modified scale a better fit for listening tests to assess synthesized speech.
AB - The quality of text-to-speech systems can be effectively assessed only on the basis of reliable and valid listening tests to assess overall system performance. A mean opinion scale (MOS) has been the recommended measure of synthesized speech quality [ITU-T Recommendation P.85, 1994. Telephone transmission quality subjective opinion tests. A method for subjective performance assessment of the quality of speech voice output devices]. We assessed this MOS scale and developed and tested a modified measure of speech quality. This modified measure has new items specific to text-to-speech systems. Our research was motivated by the lack of clear evidence of the conceptual content of as well as the psychometric properties of the MOS scale. We present conceptual arguments and empirical evidence for the reliability and validity of a modified scale. Moreover, we employ state of the art psychometric techniques such as confirmatory factor analysis to provide strong tests of psychometric properties. This modified scale is better suited to appraise synthesis systems since it includes items that are specific to the artifacts found in synthesized speech. We believe that the speech synthesis research communities will find this modified scale a better fit for listening tests to assess synthesized speech.
UR - http://www.scopus.com/inward/record.url?scp=9644270575&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=9644270575&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2003.12.001
DO - 10.1016/j.csl.2003.12.001
M3 - Article
AN - SCOPUS:9644270575
SN - 0885-2308
VL - 19
SP - 55
EP - 83
JO - Computer Speech and Language
JF - Computer Speech and Language
IS - 1
ER -