TY - JOUR
T1 - Multimodal Respiratory Rate Estimation from Audio and Video in Emergency Department Patients
AU - Harvill, John
AU - Chatterjee, Moitreya
AU - Khosla, Shaveta
AU - Alam, Mustafa
AU - Ahuja, Narendra
AU - Hasegawa-Johnson, Mark
AU - Chestek, David
AU - Beiser, David G.
N1 - Publisher Copyright:
Authors
PY - 2024
Y1 - 2024
N2 - Given the recent COVID-19 pandemic, there has been a push in the medical community for reliable, remote medical care. The ubiquity of smartphone devices has brought about much interest in the estimation of patient vital signs via an audio or video signal. Objective: In this paper, our objective is to estimate and compare respiratory rates from video, from audio, and jointly from video and audio for emergency department patients. Methods and procedures: For video, we use signal processing techniques, whereas for audio, we compare respiration rate estimates obtained using signal processing methods and learning-based methods due to the public availability of a large annotated audio corpus of breathing sounds. Results: On our collected audio-video corpus, we achieve the best Mean Absolute Error (MAE) of 2.53 when using video features. For the publicly available respiratory rate corpus, we achieve MAE of 1.63 when using signal processing methods. Conclusion: Based on the experimental results from our clinical data, we draw the conclusion that the video modality yields more accurate estimates when compared to the audio modality. Clinical impact: Accurate, contactless estimation of vital signs using video or audio is significant, because it can be performed remotely. Additionally, it is contactless and does not require extra measurement equipment.
AB - Given the recent COVID-19 pandemic, there has been a push in the medical community for reliable, remote medical care. The ubiquity of smartphone devices has brought about much interest in the estimation of patient vital signs via an audio or video signal. Objective: In this paper, our objective is to estimate and compare respiratory rates from video, from audio, and jointly from video and audio for emergency department patients. Methods and procedures: For video, we use signal processing techniques, whereas for audio, we compare respiration rate estimates obtained using signal processing methods and learning-based methods due to the public availability of a large annotated audio corpus of breathing sounds. Results: On our collected audio-video corpus, we achieve the best Mean Absolute Error (MAE) of 2.53 when using video features. For the publicly available respiratory rate corpus, we achieve MAE of 1.63 when using signal processing methods. Conclusion: Based on the experimental results from our clinical data, we draw the conclusion that the video modality yields more accurate estimates when compared to the audio modality. Clinical impact: Accurate, contactless estimation of vital signs using video or audio is significant, because it can be performed remotely. Additionally, it is contactless and does not require extra measurement equipment.
KW - Audio
KW - Autocorrelation
KW - Cameras
KW - Estimation
KW - Feature extraction
KW - Medical services
KW - Multimodal
KW - Respiratory Rate Estimation
KW - Signal Processing
KW - Spectrogram
KW - Telemedicine
KW - Video
UR - http://www.scopus.com/inward/record.url?scp=85197093410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197093410&partnerID=8YFLogxK
U2 - 10.1109/JTEHM.2024.3418345
DO - 10.1109/JTEHM.2024.3418345
M3 - Article
AN - SCOPUS:85197093410
SN - 2168-2372
SP - 1
JO - IEEE Journal of Translational Engineering in Health and Medicine
JF - IEEE Journal of Translational Engineering in Health and Medicine
ER -