TY - GEN
T1 - FairCrowd: Fair Human Face Dataset Sampling via Batch-Level Crowdsourcing Bias Inference
AU - Kou, Ziyi
AU - Zhang, Yang
AU - Shang, Lanyu
AU - Wang, Dong
N1 - Funding Information:
ACKNOWLEDGMENT This research is supported in part by the National Science Foundation under Grant No. IIS-2008228, CNS-1845639, CNS-1831669, Army Research Office under Grant W911NF-17-1-0409. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/25
Y1 - 2021/6/25
N2 - Human face image is a large category of visual information utilized by various human facial data services (e.g., face recognition, face generation, face attribute prediction). However, the quality of data services (QoDS) on human face datasets is usually biased towards the majority demographic group due to the data imbalance issue. In this paper, we focus on a fair human face dataset sampling problem where the goal is to sample a sub-dataset from the original dataset to reduce its bias by leveraging crowd intelligence to infer the demographic labels of face images (e.g., male or female, old or young). Our problem is motivated by the limitations of current fair data sampling solutions that require pre-annotated demographic labels to sample a fair dataset. Two important challenges exist in solving our problem: 1) it is extremely time-consuming and expensive to assign crowd workers to annotate demographic labels of all images in a large-scale facial dataset; 2) it is not a trivial task to improve the fairness of the sampled sub-dataset (with fewer data samples) without sacrificing the accuracy performance of data services on such dataset. To address the above challenges, we develop FairCrowd, a fair crowdsourcing-based data sampling framework that leverages an efficient batch-level demographic label inference model and a joint fair-accuracy-aware data shuffling method. We evaluate the performance of FairCrowd through a large-scale real-world face image dataset that consists of celebrity faces from a diversified set of demographic groups. The results show that FairCrowd not only reduces demographic bias but also improves the accuracy of data services trained on the sub-dataset generated by FairCrowd, leading to a more desirable QoDS of the application.
AB - Human face image is a large category of visual information utilized by various human facial data services (e.g., face recognition, face generation, face attribute prediction). However, the quality of data services (QoDS) on human face datasets is usually biased towards the majority demographic group due to the data imbalance issue. In this paper, we focus on a fair human face dataset sampling problem where the goal is to sample a sub-dataset from the original dataset to reduce its bias by leveraging crowd intelligence to infer the demographic labels of face images (e.g., male or female, old or young). Our problem is motivated by the limitations of current fair data sampling solutions that require pre-annotated demographic labels to sample a fair dataset. Two important challenges exist in solving our problem: 1) it is extremely time-consuming and expensive to assign crowd workers to annotate demographic labels of all images in a large-scale facial dataset; 2) it is not a trivial task to improve the fairness of the sampled sub-dataset (with fewer data samples) without sacrificing the accuracy performance of data services on such dataset. To address the above challenges, we develop FairCrowd, a fair crowdsourcing-based data sampling framework that leverages an efficient batch-level demographic label inference model and a joint fair-accuracy-aware data shuffling method. We evaluate the performance of FairCrowd through a large-scale real-world face image dataset that consists of celebrity faces from a diversified set of demographic groups. The results show that FairCrowd not only reduces demographic bias but also improves the accuracy of data services trained on the sub-dataset generated by FairCrowd, leading to a more desirable QoDS of the application.
KW - Crowdsourcing
KW - Fair Dataset Sampling
KW - Machine Learning for Quality of Service
KW - Quality of Data Service
UR - http://www.scopus.com/inward/record.url?scp=85115353222&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115353222&partnerID=8YFLogxK
U2 - 10.1109/IWQOS52092.2021.9521312
DO - 10.1109/IWQOS52092.2021.9521312
M3 - Conference contribution
SN - 9781665430548
T3 - 2021 IEEE/ACM 29th International Symposium on Quality of Service, IWQOS 2021
SP - 1
EP - 10
BT - 2021 IEEE/ACM 29th International Symposium on Quality of Service, IWQOS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 29th IEEE/ACM International Symposium on Quality of Service, IWQOS 2021
Y2 - 25 June 2021 through 28 June 2021
ER -