In geophysics, crowdsourcing is an emerging nontraditional environmental monitoring approach that supports data acquisition from individual citizens. However, because of the involvement of undertrained citizens and imprecise low-cost sensors, crowdsourced data applications suffer from different types of noises that can deteriorate the overall monitoring accuracy. In this study, we propose a machine learning approach for automatic crowdsourced data quality control (CSQC) that detects and removes noisy data inputs in spatially and temporally discrete crowdsourced observations coming from both fixed-point sensors (e.g., surveillance cameras) and moving sensors (e.g., moving cars/pedestrians). We design a set of features from original and interpolated rainfall data and use them to train and test the CSQC models using both supervised and unsupervised machine learning algorithms. The performances of the CSQC models under various scenarios assuming no retraining are also tested (hereafter referred to as transferability). The results based on synthetic but realistic data show that the CSQC models can significantly reduce the overall rainfall estimate errors. Under the stationary assumption, the CSQC models based on both supervised and unsupervised algorithms perform well in noisy data identification and overall rainfall estimation error reduction; however, if the model is transferred to other cities with different rainfall patterns or noise compositions (without retraining), supervised multilayer perceptrons (MLPs) show the best performance.
ASJC Scopus subject areas
- Water Science and Technology