Dynamic Programming and Reinforcement Learning (A.Y. 2024/25)

Lectures

Lecture Date Topics References Additional material (private)
01 02/10 Intro to the course this website, [RLOC, Preface], [RLI, Ch. 1] RLOC-Preface.pdf, RLI-Intro.pdf
02 03/10 Introduction to Markov Decision Processes - Review of Probability [RLOC, Ch. 1] [PSES, Ch. 3-4] 02-MDP-Intro.pdf 03-Probability-Basics.pdf
03 9/10 Tools for the Simulation of Stochastic Systems - 04-Simulation-Tools.pdf, 04b-Homework.pdf, 04b-Homework-Solution.zip
04 10/10 Intro to Python, Notebooks, Numpy - -
05 16/10 Basics of Discrete-Time Markov Chains - 06-Markov-Chains-Basics.pdf, 06b-Homework.pdf
06 17/10 Formulation of Markov Decision Problems [RLOC, Ch. 1] 07-MDP-Formulation.pdf
07 23/10 Deterministic Dynamic Programming [RLOC, Ch. 1] 08-Deterministic-DP.pdf, 08b-HEMS.pdf, 08c-Shortest-Path.pdf
08 24/10 Stochastic Dynamic Programming [RLOC, Ch. 1] 09-Stochastic-DP.pdf, 09b-IC.pdf, 09b-Python-IC.zip
09 30/10 Optimal Stopping Problem - 10-Optimal-Stopping.pdf, 10-Python-Optimal-Stopping.zip, 10b-Gambling.pdf
10 31/10 DP over Infinite Time Horizon [RLOC, Ch. 4] 11-Infinite-DP.pdf
11 06/11 Component replacement, Minimum-time problem - 12-Component-Replacement.pdf, 12b-Spider-Fly.pdf
12 07/11 Towards Reinforcement Learning [RLI, Ch. 1] 13-DP-Summary.pdf, RLOC-Terminology.pdf
13 13/11 Intuitive Reinforcement Learning [RLI, Ch. 2] -
14 14/11 MDP and DP Revisited [RLI, Ch. 3, 4] -
15 20/11 Monte Carlo Methods I (on-policy) [RLI, Ch. 5, Sec. 5.1-5.4] -
16 21/11 Monte Carlo Methods II (off-policy). Example: Gambler's problem [RLI, Ch. 5, Sec. 5.5-5.7, Ch. 4, Example 4.3] 18-Python-MC.zip
17 26/11 Temporal Difference Methods: Prediction [RLI, Ch. 6, Sec. 6.1-6.3] -
18 27/11 Temporal Difference Methods: Sarsa, Q-Learning, Expected Sarsa [RLI, Ch. 6, Sec. 6.4-6.6] -
19 28/11 Temporal Difference Methods: Gambler's problem. Gymnasium: Frozen Lake environment [RLI, Ch. 4, Example 4.3] 21-Python-TD.zip
20 04/12 n-step Bootstrapping Methods: Prediction [RLI, Ch. 7, Sec. 7.1] 22-Random-walk-prediction.zip
21 05/12 n-step Bootstrapping Methods: Control [RLI, Ch. 7, Sec. 7.2, 7.3, 7.5] 23-Random-walk-control.zip
22 10/12 Approximate methods: Prediction and Control [RLI, Ch. 9, Sec. 9.1-9.3, 9.7, Ch. 10, Sec. 10.1,10.2 ] -
23 11/12 Policy gradient methods: REINFORCE [RLI, Ch. 13, Sec. 13.1-13.3] -
24 12/12 Policy gradient methods: Actor-Critic, Continuous actions [RLI, Ch. 13, Sec. 13.5, 13.7] -