TY - GEN
T1 - Fairness-aware Multi-view Clustering
AU - Zheng, Lecheng
AU - Zhu, Yada
AU - He, Jingrui
N1 - Publisher Copyright:
Copyright © 2023 by SIAM.
PY - 2023
Y1 - 2023
N2 - In the era of big data, we are often facing the challenge of data heterogeneity and the lack of label information simultaneously. In the financial domain (e.g., fraud detection), the heterogeneous data may include not only numerical data (e.g., total debt and yearly income), but also text and images (e.g., financial statement and invoice images). At the same time, the label information (e.g., fraud transactions) may be missing for building predictive models. To address these challenges, many state-of-the-art multi-view clustering methods have been proposed and achieved outstanding performance. However, these methods typically do not take into consideration the fairness aspect and are likely to generate biased results using sensitive information such as race and gender. Therefore, in this paper, we propose a fairness-aware multi-view clustering method named Fair-MVC. It incorporates the group fairness constraint into the soft membership assignment for each cluster to ensure that the fraction of different groups in each cluster is approximately identical to the entire data set. Meanwhile, we adopt the idea of both contrastive learning and non-contrastive learning and propose novel regularizers to handle heterogeneous data in complex scenarios with missing data or noisy features. Experimental results on real-world data sets demonstrate the effectiveness and efficiency of the proposed framework. We also derive insights regarding the relative performance of the proposed regularizers in various scenarios.
AB - In the era of big data, we are often facing the challenge of data heterogeneity and the lack of label information simultaneously. In the financial domain (e.g., fraud detection), the heterogeneous data may include not only numerical data (e.g., total debt and yearly income), but also text and images (e.g., financial statement and invoice images). At the same time, the label information (e.g., fraud transactions) may be missing for building predictive models. To address these challenges, many state-of-the-art multi-view clustering methods have been proposed and achieved outstanding performance. However, these methods typically do not take into consideration the fairness aspect and are likely to generate biased results using sensitive information such as race and gender. Therefore, in this paper, we propose a fairness-aware multi-view clustering method named Fair-MVC. It incorporates the group fairness constraint into the soft membership assignment for each cluster to ensure that the fraction of different groups in each cluster is approximately identical to the entire data set. Meanwhile, we adopt the idea of both contrastive learning and non-contrastive learning and propose novel regularizers to handle heterogeneous data in complex scenarios with missing data or noisy features. Experimental results on real-world data sets demonstrate the effectiveness and efficiency of the proposed framework. We also derive insights regarding the relative performance of the proposed regularizers in various scenarios.
KW - Clustering
KW - Contrastive Learning
KW - Multi-view Learning
UR - http://www.scopus.com/inward/record.url?scp=85172936536&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85172936536&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85172936536
T3 - 2023 SIAM International Conference on Data Mining, SDM 2023
SP - 856
EP - 864
BT - 2023 SIAM International Conference on Data Mining, SDM 2023
PB - Society for Industrial and Applied Mathematics Publications
T2 - 2023 SIAM International Conference on Data Mining, SDM 2023
Y2 - 27 April 2023 through 29 April 2023
ER -