Lecture | Date | Topics | References | Additional material (private) |
01 | 03/10 | Intro to the course | this website, [RLOC, Preface], [RLI, Ch. 1] | RLOC-Preface.pdf |
02 | 05/10 | Introduction to Markov Decision Processes | [RLOC, Ch. 1] | 02-MDP-Intro.pdf |
03 | 10/10 | Review of Probability | [PSES, Ch. 3-4] | 03-Probability-Basics.pdf |
04 | 12/10 | Tools for the Simulation of Stochastic Systems | - | 04-Simulation-Tools.pdf, 04b-Homework.pdf |
05 | 17/10 | Intro to Python, Notebooks, Numpy | - | 05-Homework.zip |
06 | 19/10 | Basics of Discrete-Time Markov Chains | - | 06-Markov-Chains-Basics.pdf, 06b-Homework.pdf |
07 | 24/10 | Formulation of Markov Decision Problems | [RLOC, Ch. 1] | 07-MDP-Formulation.pdf |
08 | 26/10 | Deterministic Dynamic Programming | [RLOC, Ch. 1] | 08-Deterministic-DP.pdf, 08b-HEMS.pdf, 08c-Shortest-Path.pdf |
09 | 31/10 | Stochastic Dynamic Programming | [RLOC, Ch. 1] | 09-Stochastic-DP.pdf, 09b-IC.pdf, 09b-Python-IC.zip |
10 | 02/11 | Optimal Stopping Problem | - | 10-Optimal-Stopping.pdf, 10-Python-Optimal-Stopping.zip, 10b-Gambling.pdf |
11 | 07/11 | DP over Infinite Time Horizon | [RLOC, Ch. 4] | 11-Infinite-DP.pdf |
12 | 09/11 | Component replacement, Minimum-time problem | - | 12-Component-Replacement.pdf, 12b-Spider-Fly.pdf |
13 | 14/11 | Towards Reinforcement Learning | [RLI, Ch. 1] | 13-DP-Summary.pdf, RLOC-Terminology.pdf |
14 | 16/11 | Intuitive Reinforcement Learning | [RLI, Ch. 2] | - |
15 | 21/11 | MDP and DP Revisited | [RLI, Ch. 3, 4] | - |
16 | 23/11 | Monte Carlo Methods I (on-policy) | [RLI, Ch. 5, Sec. 5.1-5.4] | - |
17 | 28/11 | Monte Carlo Methods II (off-policy) | [RLI, Ch. 5, Sec. 5.5-5.7] | - |
18 | 30/11 | Monte Carlo Methods: Gambler's problem | [RLI, Ch. 4, Example 4.3] | 18-Python-MC.zip |
19 | 05/12 | Temporal Difference Methods: Prediction | [RLI, Ch. 6, Sec. 6.1-6.3] | - |
20 | 07/12 | Temporal Difference Methods: Sarsa, Q-Learning, Expected Sarsa | [RLI, Ch. 6, Sec. 6.4-6.6] | - |
21 | 12/12 | Temporal Difference Methods: Gambler's problem. Gymnasium: Frozen Lake environment | [RLI, Ch. 4, Example 4.3] | 21-Python-TD.zip |
22 | 14/12 | n-step Bootstrapping Methods: Prediction | [RLI, Ch. 7, Sec. 7.1] | 22-Random-walk.zip |
23 | 19/12 | n-step Bootstrapping Methods: Control | [RLI, Ch. 7, Sec. 7.2, 7.3, 7.5] | 22-Random-walk-control.zip |
24 | 21/12 | RL Conclusions. Further topics: Approximate methods, Eligibility Traces, Policy gradient | [RLI, Ch. 9, 10, 12, 13 ] | - |
|