TY - JOUR
T1 - Detection of eye contact with deep neural networks is as accurate as human experts
AU - Chong, Eunji
AU - Clark-Whitney, Elysha
AU - Southerland, Audrey
AU - Stubbs, Elizabeth
AU - Miller, Chanel
AU - Ajodan, Eliana L.
AU - Silverman, Melanie R.
AU - Lord, Catherine
AU - Rozga, Agata
AU - Jones, Rebecca M.
AU - Rehg, James M.
N1 - Funding Information:
This study was funded in part by Simons Foundation awards 336363 and 383667, and NIH R01 MH114999. We thank Benjamin Silver, Erin McDonald, Ellise Sims, and Sarah Nay for their contributions to data annotation.
Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12
Y1 - 2020/12
N2 - Eye contact is among the most primary means of social communication used by humans. Quantification of eye contact is valuable as a part of the analysis of social roles and communication skills, and for clinical screening. Estimating a subject’s looking direction is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint. While moments of eye contact from this viewpoint can be hand-coded, such a process tends to be laborious and subjective. In this work, we develop a deep neural network model to automatically detect eye contact in egocentric video. It is the first to achieve accuracy equivalent to that of human experts. We train a deep convolutional network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 subjects have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision of 0.936 and recall of 0.943 on 18 validation subjects, and its performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. Our method will be instrumental in gaze behavior analysis by serving as a scalable, objective, and accessible tool for clinicians and researchers.
AB - Eye contact is among the most primary means of social communication used by humans. Quantification of eye contact is valuable as a part of the analysis of social roles and communication skills, and for clinical screening. Estimating a subject’s looking direction is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint. While moments of eye contact from this viewpoint can be hand-coded, such a process tends to be laborious and subjective. In this work, we develop a deep neural network model to automatically detect eye contact in egocentric video. It is the first to achieve accuracy equivalent to that of human experts. We train a deep convolutional network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 subjects have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision of 0.936 and recall of 0.943 on 18 validation subjects, and its performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. Our method will be instrumental in gaze behavior analysis by serving as a scalable, objective, and accessible tool for clinicians and researchers.
UR - http://www.scopus.com/inward/record.url?scp=85097535889&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097535889&partnerID=8YFLogxK
U2 - 10.1038/s41467-020-19712-x
DO - 10.1038/s41467-020-19712-x
M3 - Article
C2 - 33318484
AN - SCOPUS:85097535889
SN - 2041-1723
VL - 11
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 6386
ER -