Reference:
S. Mallick,
F. Airaldi,
A. Dabiri, and
B. De Schutter,
"Multi-agent reinforcement learning via distributed MPC as a function
approximator," Automatica, vol. 167, p. 111803, Sept. 2024.
Abstract:
This paper presents a novel approach to multi-agent reinforcement
learning (RL) for linear systems with convex polytopic constraints.
Existing work on RL has demonstrated the use of model predictive
control (MPC) as a function approximator for the policy and value
functions. The current paper is the first work to extend this idea to
the multi-agent setting. We propose the use of a distributed MPC
scheme as a function approximator, with a structure allowing for
distributed learning and deployment. We then show that Q-learning
updates can be performed distributively without introducing
nonstationarity, by reconstructing a centralized learning update. The
effectiveness of the approach is demonstrated on a numerical example.