Abstract
In-person assessment of teaching emotion as part of a teacher's working performance evaluation is onerous, and can be intrusive to some teachers in teaching. This study shows preliminary attempts at solving this issue using a machine learning approach with bi-modal artificial neural networks, which made predictions by combining acoustic and textual features extracted from preschool teachers' spontaneous speech captured in real teaching scenarios. In a binary classification task identifying the emotions as "positive" and "negative", the prediction achieved an accuracy of 79.0%, an F1-score of 83.1% in cross-validation, and an accuracy of 68.4%, an F1-score of 67.2% in leave-one-subject-out validation. When further adding speech samples identified as "neutral" to the task, a decreased accuracy of 52.4% and F1-score of 53.6% were received in cross-validation, confirming the difficulties in labelling this type of naturalistic speech data even for human raters.
Original language | English (US) |
---|---|
Journal | Proceedings of 20th International Congress of Phonetic Sciences (ICPhS) |
Volume | 2023-August |
State | Published - 2023 |
Event | 20th International Congress of Phonetic Sciences (ICPhS) - Prague Congress Center, Prague, Czech Republic Duration: Aug 7 2023 → Aug 11 2023 |