Using prior knowledge to accelerate online least-squares policy iteration


Reference:
L. Busoniu, B. De Schutter, R. Babuska, and D. Ernst, "Using prior knowledge to accelerate online least-squares policy iteration," Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR 2010), Cluj-Napoca, Romania, 6 pp., May 2010. Paper A-S2-1/3005.

Abstract:
Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often available and can be exploited to great advantage. In this paper, we consider prior knowledge about the monotonicity of the control policy with respect to the system states, and we introduce an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI). Monotonic policies are appropriate for important classes of systems appearing in control applications. LSPI is a data-efficient RL algorithm that we previously extended to online learning, but that did not provide until now a way to use prior knowledge about the policy. In an empirical evaluation, online LSPI with prior knowledge learns much faster and more reliably than the original online LSPI.


Downloads:
 * Corresponding technical report: pdf file (275 KB)
      Note: More information on the pdf file format mentioned above can be found here.


Bibtex entry:

@inproceedings{BusDeS:10-024,
        author={L. Bu{\c{s}}oniu and B. {D}e Schutter and R. Babu{\v{s}}ka and D. Ernst},
        title={Using prior knowledge to accelerate online least-squares policy iteration},
        booktitle={Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR 2010)},
        address={Cluj-Napoca, Romania},
        month=may,
        year={2010},
        note={Paper A-S2-1/3005}
        }



Go to the publications overview page.


This page is maintained by Bart De Schutter. Last update: March 21, 2022.