Dynamic Programming and Reinforcement Learning (A.Y. 2023/24)


Lecture Date Topics References Additional material (private)
01 03/10 Intro to the course this website, [RLOC, Preface], [RLI, Ch. 1] RLOC-Preface.pdf
02 05/10 Introduction to Markov Decision Processes [RLOC, Ch. 1] 02-MDP-Intro.pdf
03 10/10 Review of Probability [PSES, Ch. 3-4] 03-Probability-Basics.pdf
04 12/10 Tools for the Simulation of Stochastic Systems - 04-Simulation-Tools.pdf, 04b-Homework.pdf
05 17/10 Intro to Python, Notebooks, Numpy - 05-Homework.zip
06 19/10 Basics of Discrete-Time Markov Chains - 06-Markov-Chains-Basics.pdf, 06b-Homework.pdf
07 24/10 Formulation of Markov Decision Problems [RLOC, Ch. 1] 07-MDP-Formulation.pdf
08 26/10 Deterministic Dynamic Programming [RLOC, Ch. 1] 08-Deterministic-DP.pdf, 08b-HEMS.pdf, 08c-Shortest-Path.pdf
09 31/10 Stochastic Dynamic Programming [RLOC, Ch. 1] 09-Stochastic-DP.pdf, 09b-IC.pdf, 09b-Python-IC.zip
10 02/11 Optimal Stopping Problem - 10-Optimal-Stopping.pdf, 10-Python-Optimal-Stopping.zip, 10b-Gambling.pdf
11 07/11 DP over Infinite Time Horizon [RLOC, Ch. 4] 11-Infinite-DP.pdf
12 09/11 Component replacement, Minimum-time problem - 12-Component-Replacement.pdf, 12b-Spider-Fly.pdf
13 14/11 Towards Reinforcement Learning [RLI, Ch. 1] 13-DP-Summary.pdf, RLOC-Terminology.pdf
14 16/11 Intuitive Reinforcement Learning [RLI, Ch. 2] -
15 21/11 MDP and DP Revisited [RLI, Ch. 3, 4] -
16 23/11 Monte Carlo Methods I (on-policy) [RLI, Ch. 5, Sec. 5.1-5.4] -
17 28/11 Monte Carlo Methods II (off-policy) [RLI, Ch. 5, Sec. 5.5-5.7] -
18 30/11 Monte Carlo Methods: Gambler's problem [RLI, Ch. 4, Example 4.3] 18-Python-MC.zip
19 05/12 Temporal Difference Methods: Prediction [RLI, Ch. 6, Sec. 6.1-6.3] -
20 07/12 Temporal Difference Methods: Sarsa, Q-Learning, Expected Sarsa [RLI, Ch. 6, Sec. 6.4-6.6] -
21 12/12 Temporal Difference Methods: Gambler's problem. Gymnasium: Frozen Lake environment [RLI, Ch. 4, Example 4.3] 21-Python-TD.zip
22 14/12 n-step Bootstrapping Methods: Prediction [RLI, Ch. 7, Sec. 7.1] 22-Random-walk.zip
23 19/12 n-step Bootstrapping Methods: Control [RLI, Ch. 7, Sec. 7.2, 7.3, 7.5] 22-Random-walk-control.zip
24 21/12 RL Conclusions. Further topics: Approximate methods, Eligibility Traces, Policy gradient [RLI, Ch. 9, 10, 12, 13 ] -