Dynamic Programming and Reinforcement Learning (A.Y. 2024/25)

Instructor

The course is 6CFU (about 48 hours, roughly split as 32h of lectures and 16h of computer exercises). It provides an introduction to dynamic programming (DP) and reinforcement learning (RL). Throughout the course, we will deal with Markov decision processes (MDPs), which represent a special class of optimal control problems in the presence of uncertainty. The DP algorithm provides a way to find an exact solution. However, when the problem is very large (many possible states), the DP algorithm becomes computationally intractable. Moreover, DP requires a complete knowledge of the MDP in order to be implemented. In those cases, approximate solutions are sought. RL methods provides a variety of solution strategies that are able to compute suboptimal policies even when a model of the system or of the environment is not completely specified. At the end of the course, the students will be able to properly formulate real-world problems as MDPs, and to find exact or approximate solutions by implementing DP and RL algorithms in Python.
The detailed program of the course can be found here.
Information about the teaching material can be found here.
The exam consists of a project work and an oral examination. More information can be found here.