Reference:
A. Athrey,
O. Mazhar,
M. Guo,
B. De Schutter, and
S. Shi,
"Regret analysis of learning-based linear quadratic gaussian control
with additive exploration," Proceedings of the 2024 European
Control Conference, Stockholm, Sweden, pp. 1795-1801, June 2024.
Abstract:
In this paper, we analyze the regret incurred by a computationally
efficient exploration strategy, known as naive exploration, for
controlling unknown partially observable systems within the Linear
Quadratic Gaussian (LQG) framework. We introduce a two-phase control
algorithm called LQG-NAIVE, which involves an initial phase of
injecting Gaussian input signals to obtain a system model, followed by
a second phase of an interplay between naive exploration and control
in an episodic fashion. We show that LQG-NAIVE achieves a regret
growth rate of Õ(T1/2), i.e., O(T1/2) up
to logarithmic factors after T time steps, and we validate its
performance through numerical simulations. Additionally, we propose
LQG-IF2E, which extends the exploration signal to a 'closed-loop'
setting by incorporating the Fisher Information Matrix (FIM). We
provide compelling numerical evidence of the competitive performance
of LQG-IF2E compared to LQG-NAIVE.
Bibtex entry:
@inproceedings{AthMaz:24-016,