TY - JOUR
T1 - Comparative Analysis of Supervised Classification Algorithms for Residential Water End Uses
AU - Heydari, Zahra
AU - Stillwell, Ashlynn S.
N1 - Publisher Copyright:
© 2024. The Authors. Water Resources Research published by Wiley Periodicals LLC on behalf of American Geophysical Union.
PY - 2024/6
Y1 - 2024/6
N2 - Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log-reg; and Neural Networks, NN) for residential water end-use classification using actual (measured) and synthetic labeled data sets. We generated synthetic labeled data using Conditional Tabular Generative Adversarial Networks. We then utilized grid search to train each model on their respective optimized hyperparameters. The RF model exhibited the best model performance overall, while the Log-reg model had the shortest execution times under different balanced and imbalanced (based on number of events per class) synthetic data scenarios, demonstrating a computationally efficient alternative for RF for specific end uses. The NN model exhibited high performance with the tradeoff of longer execution times compared to the other classification models. In the balanced data set scenario, all models achieved closely aligned F1-scores, ranging from 0.83 to 0.90. However, when faced with imbalanced data reflective of actual conditions, both the SVM and Log-reg models showed inferior performance compared to the RF and NN models. Overall, we concluded that decision tree-based models emerge as the optimal choice for classification tasks in the context of water end-use data. Our study advances residential smart water metering systems through creating synthetic labeled end-use data and providing insight into the strengths and weaknesses of various supervised machine learning classifiers for end-use identification.
AB - Water sustainability in the built environment requires an accurate estimation of residential water end uses (e.g., showers, toilets, faucets, etc.). In this study, we evaluate the performance of four models (Random Forest, RF; Support Vector Machines, SVM; Logistic Regression, Log-reg; and Neural Networks, NN) for residential water end-use classification using actual (measured) and synthetic labeled data sets. We generated synthetic labeled data using Conditional Tabular Generative Adversarial Networks. We then utilized grid search to train each model on their respective optimized hyperparameters. The RF model exhibited the best model performance overall, while the Log-reg model had the shortest execution times under different balanced and imbalanced (based on number of events per class) synthetic data scenarios, demonstrating a computationally efficient alternative for RF for specific end uses. The NN model exhibited high performance with the tradeoff of longer execution times compared to the other classification models. In the balanced data set scenario, all models achieved closely aligned F1-scores, ranging from 0.83 to 0.90. However, when faced with imbalanced data reflective of actual conditions, both the SVM and Log-reg models showed inferior performance compared to the RF and NN models. Overall, we concluded that decision tree-based models emerge as the optimal choice for classification tasks in the context of water end-use data. Our study advances residential smart water metering systems through creating synthetic labeled end-use data and providing insight into the strengths and weaknesses of various supervised machine learning classifiers for end-use identification.
KW - classification
KW - logistic regression
KW - machine learning
KW - neural network
KW - random forest
KW - residential water
KW - smart metering systems
KW - supervised learning
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85196155993&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196155993&partnerID=8YFLogxK
U2 - 10.1029/2023WR036690
DO - 10.1029/2023WR036690
M3 - Article
AN - SCOPUS:85196155993
SN - 0043-1397
VL - 60
JO - Water Resources Research
JF - Water Resources Research
IS - 6
M1 - e2023WR036690
ER -