TY - GEN
T1 - Online planning for decentralized stochastic control with partial history sharing
AU - Zhang, Kaiqing
AU - Miehling, Erik
AU - Basar, Tamer
N1 - Publisher Copyright:
© 2019 American Automatic Control Council.
PY - 2019/7
Y1 - 2019/7
N2 - In decentralized stochastic control, standard approaches for sequential decision-making, e.g. dynamic programming, quickly become intractable due to the need to maintain a complex information state. Computational challenges are further compounded if agents do not possess complete model knowledge. In this paper, we take advantage of the fact that in many problems agents share some common information, or history, termed partial history sharing. Under this information structure the policy search space is greatly reduced. We propose a provably convergent, online tree-search based algorithm that does not require a closed-form model or explicit communication among agents. Interestingly, our algorithm can be viewed as a generalization of several existing heuristic solvers for decentralized partially observable Markov decision processes. To demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network. Numerical results demonstrate the performance of our algorithm.
AB - In decentralized stochastic control, standard approaches for sequential decision-making, e.g. dynamic programming, quickly become intractable due to the need to maintain a complex information state. Computational challenges are further compounded if agents do not possess complete model knowledge. In this paper, we take advantage of the fact that in many problems agents share some common information, or history, termed partial history sharing. Under this information structure the policy search space is greatly reduced. We propose a provably convergent, online tree-search based algorithm that does not require a closed-form model or explicit communication among agents. Interestingly, our algorithm can be viewed as a generalization of several existing heuristic solvers for decentralized partially observable Markov decision processes. To demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network. Numerical results demonstrate the performance of our algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85072299550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072299550&partnerID=8YFLogxK
U2 - 10.23919/acc.2019.8814803
DO - 10.23919/acc.2019.8814803
M3 - Conference contribution
AN - SCOPUS:85072299550
T3 - Proceedings of the American Control Conference
SP - 3544
EP - 3550
BT - 2019 American Control Conference, ACC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 American Control Conference, ACC 2019
Y2 - 10 July 2019 through 12 July 2019
ER -