Dynamic Programming and Reinforcement Learning (A.Y. 2024/25)
Contents
Introduction to Markov Decision Processes: Dynamic Programming and Reinforcement Learning
Review of probability calculus
Tools for the simulation of stochastic systems
Review of Markov chains
Formulation of Markov Decision Problems (MDPs)
Deterministic dynamic programming
Stochastic dynamic programming
Optimal stopping problems
Dynamic programming over infinite time horizon
Minimum time problems
Approximate dynamic programming
Monte Carlo methods
On-policy and off-policy methods
Temporal difference methods
Q-Learning
SARSA
Expected SARSA
n-Step bootstrapping
Policy gradient
Overview of further topics: Approximate methods, Eligibility Traces
Throughout the course, exercises will be worked out by using small Python scripts
Prerequisites
Basic notions of probability and Markov Chains
Prior knowledge of Python is useful, although not required
|