TY - JOUR
T1 - Crowdsourcing and machine learning approaches for extracting entities indicating potential foodborne outbreaks from social media
AU - Tao, Dandan
AU - Zhang, Dongyu
AU - Hu, Ruofan
AU - Rundensteiner, Elke
AU - Feng, Hao
N1 - Funding Information:
This work was supported by the Agriculture and Food Research Initiative (AFRI) Award No. 2020-67021-32459 from the U. S. Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) and by AFRI Award No. 2020-67021-32855/project Accession No. 1024262 from the USDA NIFA that is being administered through AIFS: the AI Institute for Next Generation Food Systems.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Foodborne outbreaks are a serious but preventable threat to public health that often lead to illness, loss of life, significant economic loss, and the erosion of consumer confidence. Understanding how consumers respond when interacting with foods, as well as extracting information from posts on social media may provide new means of reducing the risks and curtailing the outbreaks. In recent years, Twitter has been employed as a new tool for identifying unreported foodborne illnesses. However, there is a huge gap between the identification of sporadic illnesses and the early detection of a potential outbreak. In this work, the dual-task BERTweet model was developed to identify unreported foodborne illnesses and extract foodborne-illness-related entities from Twitter. Unlike previous methods, our model leveraged the mutually beneficial relationships between the two tasks. The results showed that the F1-score of relevance prediction was 0.87, and the F1-score of entity extraction was 0.61. Key elements such as time, location, and food detected from sentences indicating foodborne illnesses were used to analyze potential foodborne outbreaks in massive historical tweets. A case study on tweets indicating foodborne illnesses showed that the discovered trend is consistent with the true outbreaks that occurred during the same period.
AB - Foodborne outbreaks are a serious but preventable threat to public health that often lead to illness, loss of life, significant economic loss, and the erosion of consumer confidence. Understanding how consumers respond when interacting with foods, as well as extracting information from posts on social media may provide new means of reducing the risks and curtailing the outbreaks. In recent years, Twitter has been employed as a new tool for identifying unreported foodborne illnesses. However, there is a huge gap between the identification of sporadic illnesses and the early detection of a potential outbreak. In this work, the dual-task BERTweet model was developed to identify unreported foodborne illnesses and extract foodborne-illness-related entities from Twitter. Unlike previous methods, our model leveraged the mutually beneficial relationships between the two tasks. The results showed that the F1-score of relevance prediction was 0.87, and the F1-score of entity extraction was 0.61. Key elements such as time, location, and food detected from sentences indicating foodborne illnesses were used to analyze potential foodborne outbreaks in massive historical tweets. A case study on tweets indicating foodborne illnesses showed that the discovered trend is consistent with the true outbreaks that occurred during the same period.
UR - http://www.scopus.com/inward/record.url?scp=85118683999&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118683999&partnerID=8YFLogxK
U2 - 10.1038/s41598-021-00766-w
DO - 10.1038/s41598-021-00766-w
M3 - Article
C2 - 34737325
AN - SCOPUS:85118683999
VL - 11
JO - Scientific Reports
JF - Scientific Reports
SN - 2045-2322
IS - 1
M1 - 21678
ER -