TY - JOUR
T1 - Automatic Quality Control of Crowdsourced Rainfall Data With Multiple Noises
T2 - A Machine Learning Approach
AU - Niu, Geng
AU - Yang, Pan
AU - Zheng, Yi
AU - Cai, Ximing
AU - Qin, Huapeng
N1 - Publisher Copyright:
© 2021. American Geophysical Union. All Rights Reserved.
PY - 2021/11
Y1 - 2021/11
N2 - In geophysics, crowdsourcing is an emerging nontraditional environmental monitoring approach that supports data acquisition from individual citizens. However, because of the involvement of undertrained citizens and imprecise low-cost sensors, crowdsourced data applications suffer from different types of noises that can deteriorate the overall monitoring accuracy. In this study, we propose a machine learning approach for automatic crowdsourced data quality control (CSQC) that detects and removes noisy data inputs in spatially and temporally discrete crowdsourced observations coming from both fixed-point sensors (e.g., surveillance cameras) and moving sensors (e.g., moving cars/pedestrians). We design a set of features from original and interpolated rainfall data and use them to train and test the CSQC models using both supervised and unsupervised machine learning algorithms. The performances of the CSQC models under various scenarios assuming no retraining are also tested (hereafter referred to as transferability). The results based on synthetic but realistic data show that the CSQC models can significantly reduce the overall rainfall estimate errors. Under the stationary assumption, the CSQC models based on both supervised and unsupervised algorithms perform well in noisy data identification and overall rainfall estimation error reduction; however, if the model is transferred to other cities with different rainfall patterns or noise compositions (without retraining), supervised multilayer perceptrons (MLPs) show the best performance.
AB - In geophysics, crowdsourcing is an emerging nontraditional environmental monitoring approach that supports data acquisition from individual citizens. However, because of the involvement of undertrained citizens and imprecise low-cost sensors, crowdsourced data applications suffer from different types of noises that can deteriorate the overall monitoring accuracy. In this study, we propose a machine learning approach for automatic crowdsourced data quality control (CSQC) that detects and removes noisy data inputs in spatially and temporally discrete crowdsourced observations coming from both fixed-point sensors (e.g., surveillance cameras) and moving sensors (e.g., moving cars/pedestrians). We design a set of features from original and interpolated rainfall data and use them to train and test the CSQC models using both supervised and unsupervised machine learning algorithms. The performances of the CSQC models under various scenarios assuming no retraining are also tested (hereafter referred to as transferability). The results based on synthetic but realistic data show that the CSQC models can significantly reduce the overall rainfall estimate errors. Under the stationary assumption, the CSQC models based on both supervised and unsupervised algorithms perform well in noisy data identification and overall rainfall estimation error reduction; however, if the model is transferred to other cities with different rainfall patterns or noise compositions (without retraining), supervised multilayer perceptrons (MLPs) show the best performance.
UR - http://www.scopus.com/inward/record.url?scp=85119872137&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119872137&partnerID=8YFLogxK
U2 - 10.1029/2020WR029121
DO - 10.1029/2020WR029121
M3 - Article
AN - SCOPUS:85119872137
SN - 0043-1397
VL - 57
JO - Water Resources Research
JF - Water Resources Research
IS - 11
M1 - e2020WR029121
ER -