TY - GEN
T1 - Imputing Missing Social Media Data Stream in Multisensor Studies of Human Behavior
AU - Saha, Koustuv
AU - Mulukutla, Raghu
AU - Nies, Kari
AU - Robles-Granda, Pablo
AU - Sirigiri, Anusha
AU - Yoo, Dong Whi
AU - Audia, Pino
AU - Campbell, Andrew T.
AU - Chawla, Nitesh V.
AU - D'Mello, Sidney K.
AU - Dey, Anind K.
AU - Reddy, Manikanta D.
AU - Jiang, Kaifeng
AU - Liu, Qiang
AU - Mark, Gloria
AU - Moskal, Edward
AU - Striegel, Aaron
AU - De Choudhury, Munmun
AU - Das Swain, Vedant
AU - Gregg, Julie M.
AU - Grover, Ted
AU - Lin, Suwen
AU - Martinez, Gonzalo J.
AU - Mattingly, Stephen M.
AU - Mirjafari, Shayan
N1 - Funding Information:
This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA Contract No. 2017-17042800007. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - The ubiquitous use of social media enables researchers to obtain self-recorded longitudinal data of individuals in real-time. Because this data can be collected in an inexpensive and unobtrusive way at scale, social media has been adopted as a 'passive sensor' to study human behavior. However, such research is impacted by the lack of homogeneity in the use of social media, and the engineering challenges in obtaining such data. This paper proposes a statistical framework to leverage the potential of social media in sensing studies of human behavior, while navigating the challenges associated with its sparsity. Our framework is situated in a large-scale in-situ study concerning the passive assessment of psychological constructs of 757 information workers wherein of four sensing streams was deployed-bluetooth beacons, wearable, smartphone, and social media. Our framework includes principled feature transformation and machine learning models that predict latent social media features from the other passive sensors. We demonstrate the efficacy of this imputation framework via a high correlation of 0.78 between actual and imputed social media features. With the imputed features we test and validate predictions on psychological constructs like personality traits and affect. We find that adding the social media data streams, in their imputed form, improves the prediction of these measures. We discuss how our framework can be valuable in multimodal sensing studies that aim to gather comprehensive signals about an individual's state or situation.
AB - The ubiquitous use of social media enables researchers to obtain self-recorded longitudinal data of individuals in real-time. Because this data can be collected in an inexpensive and unobtrusive way at scale, social media has been adopted as a 'passive sensor' to study human behavior. However, such research is impacted by the lack of homogeneity in the use of social media, and the engineering challenges in obtaining such data. This paper proposes a statistical framework to leverage the potential of social media in sensing studies of human behavior, while navigating the challenges associated with its sparsity. Our framework is situated in a large-scale in-situ study concerning the passive assessment of psychological constructs of 757 information workers wherein of four sensing streams was deployed-bluetooth beacons, wearable, smartphone, and social media. Our framework includes principled feature transformation and machine learning models that predict latent social media features from the other passive sensors. We demonstrate the efficacy of this imputation framework via a high correlation of 0.78 between actual and imputed social media features. With the imputed features we test and validate predictions on psychological constructs like personality traits and affect. We find that adding the social media data streams, in their imputed form, improves the prediction of these measures. We discuss how our framework can be valuable in multimodal sensing studies that aim to gather comprehensive signals about an individual's state or situation.
KW - Imputation
KW - Multisensor
KW - Social media
KW - Wellbeing
UR - http://www.scopus.com/inward/record.url?scp=85075034965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075034965&partnerID=8YFLogxK
U2 - 10.1109/ACII.2019.8925479
DO - 10.1109/ACII.2019.8925479
M3 - Conference contribution
AN - SCOPUS:85075034965
T3 - 2019 8th International Conference on Affective Computing and Intelligent Interaction, ACII 2019
BT - 2019 8th International Conference on Affective Computing and Intelligent Interaction, ACII 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Affective Computing and Intelligent Interaction, ACII 2019
Y2 - 3 September 2019 through 6 September 2019
ER -