TY - JOUR
T1 - Can Computers Outperform Humans in Detecting User Zone-Outs? Implications for Intelligent Interfaces
AU - Bosch, Nigel
AU - D'Mello, Sidney K.
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/4
Y1 - 2022/4
N2 - The ability to identify whether a user is "zoning out"(mind wandering) from video has many HCI (e.g., distance learning, high-stakes vigilance tasks). However, it remains unknown how well humans can perform this task, how they compare to automatic computerized approaches, and how a fusion of the two might improve accuracy. We analyzed videos of users' faces and upper bodies recorded 10s prior to self-reported mind wandering (i.e., ground truth) while they engaged in a computerized reading task. We found that a state-of-the-art machine learning model had comparable accuracy to aggregated judgments of nine untrained human observers (area under receiver operating characteristic curve [AUC] = .598 versus .589). A fusion of the two (AUC = .644) outperformed each, presumably because each focused on complementary cues. Furthermore, adding more humans beyond 3-4 observers yielded diminishing returns. We discuss implications of human-computer fusion as a means to improve accuracy in complex tasks.
AB - The ability to identify whether a user is "zoning out"(mind wandering) from video has many HCI (e.g., distance learning, high-stakes vigilance tasks). However, it remains unknown how well humans can perform this task, how they compare to automatic computerized approaches, and how a fusion of the two might improve accuracy. We analyzed videos of users' faces and upper bodies recorded 10s prior to self-reported mind wandering (i.e., ground truth) while they engaged in a computerized reading task. We found that a state-of-the-art machine learning model had comparable accuracy to aggregated judgments of nine untrained human observers (area under receiver operating characteristic curve [AUC] = .598 versus .589). A fusion of the two (AUC = .644) outperformed each, presumably because each focused on complementary cues. Furthermore, adding more humans beyond 3-4 observers yielded diminishing returns. We discuss implications of human-computer fusion as a means to improve accuracy in complex tasks.
KW - Mind wandering
KW - attention-aware interfaces
KW - facial expression recognition
KW - human-machine comparison
UR - http://www.scopus.com/inward/record.url?scp=85130418594&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130418594&partnerID=8YFLogxK
U2 - 10.1145/3481889
DO - 10.1145/3481889
M3 - Article
AN - SCOPUS:85130418594
SN - 1073-0516
VL - 29
JO - ACM Transactions on Computer-Human Interaction
JF - ACM Transactions on Computer-Human Interaction
IS - 2
M1 - 10
ER -