TY - JOUR
T1 - Towards Return Parity in Markov Decision Processes
AU - Chi, Jianfeng
AU - Shen, Jian
AU - Dai, Xinyi
AU - Zhang, Weinan
AU - Tian, Yuan
AU - Zhao, Han
N1 - We thank the anonymous reviewers for their insightful comments. Jian Shen, Xinyi Dai, and Weinan Zhang acknowledge support from “New Generation of AI 2030” Major Project (2018AAA0100900), Shanghai Municipal Science and National Natural Science Foundation of China (62076161). Jianfeng Chi and Yuan Tian acknowledge support from NSF 1920462, 1943100, 2002985, and a Google research scholar award. Han Zhao would like to thank support from a Facebook research award.
PY - 2022
Y1 - 2022
N2 - Algorithmic decisions made by machine learning models in high-stakes domains may have lasting impacts over time. However, naive applications of standard fairness criterion in static settings over temporal domains may lead to delayed and adverse effects. To understand the dynamics of performance disparity, we study a fairness problem in Markov decision processes (MDPs). Specifically, we propose return parity, a fairness notion that requires MDPs from different demographic groups that share the same state and action spaces to achieve approximately the same expected time-discounted rewards. We first provide a decomposition theorem for return disparity, which decomposes the return disparity of any two MDPs sharing the same state and action spaces into the distance between group-wise reward functions, the discrepancy of group policies, and the discrepancy between state visitation distributions induced by the group policies. Motivated by our decomposition theorem, we propose algorithms to mitigate return disparity via learning a shared group policy with state visitation distributional alignment using integral probability metrics. We conduct experiments to corroborate our results, showing that the proposed algorithm can successfully close the disparity gap while maintaining the performance of policies on two real-world recommender system benchmark datasets.
AB - Algorithmic decisions made by machine learning models in high-stakes domains may have lasting impacts over time. However, naive applications of standard fairness criterion in static settings over temporal domains may lead to delayed and adverse effects. To understand the dynamics of performance disparity, we study a fairness problem in Markov decision processes (MDPs). Specifically, we propose return parity, a fairness notion that requires MDPs from different demographic groups that share the same state and action spaces to achieve approximately the same expected time-discounted rewards. We first provide a decomposition theorem for return disparity, which decomposes the return disparity of any two MDPs sharing the same state and action spaces into the distance between group-wise reward functions, the discrepancy of group policies, and the discrepancy between state visitation distributions induced by the group policies. Motivated by our decomposition theorem, we propose algorithms to mitigate return disparity via learning a shared group policy with state visitation distributional alignment using integral probability metrics. We conduct experiments to corroborate our results, showing that the proposed algorithm can successfully close the disparity gap while maintaining the performance of policies on two real-world recommender system benchmark datasets.
UR - http://www.scopus.com/inward/record.url?scp=85142513612&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142513612&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85142513612
SN - 2640-3498
VL - 151
SP - 1161
EP - 1178
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022
Y2 - 28 March 2022 through 30 March 2022
ER -