Lecture | Date | Topics | References | Additional material (private) |
01 | 02/10 | Intro to the course | this website, [RLOC, Preface], [RLI, Ch. 1] | RLOC-Preface.pdf, RLI-Intro.pdf |
02 | 03/10 | Introduction to Markov Decision Processes - Review of Probability | [RLOC, Ch. 1] [PSES, Ch. 3-4] | 02-MDP-Intro.pdf 03-Probability-Basics.pdf |
03 | 9/10 | Tools for the Simulation of Stochastic Systems | - | 04-Simulation-Tools.pdf, 04b-Homework.pdf, 04b-Homework-Solution.zip |
04 | 10/10 | Intro to Python, Notebooks, Numpy | - | - |
05 | 16/10 | Basics of Discrete-Time Markov Chains | - | 06-Markov-Chains-Basics.pdf, 06b-Homework.pdf |
06 | 17/10 | Formulation of Markov Decision Problems | [RLOC, Ch. 1] | 07-MDP-Formulation.pdf |
07 | 23/10 | Deterministic Dynamic Programming | [RLOC, Ch. 1] | 08-Deterministic-DP.pdf, 08b-HEMS.pdf, 08c-Shortest-Path.pdf |
08 | 24/10 | Stochastic Dynamic Programming | [RLOC, Ch. 1] | 09-Stochastic-DP.pdf, 09b-IC.pdf, 09b-Python-IC.zip |
09 | 30/10 | Optimal Stopping Problem | - | 10-Optimal-Stopping.pdf, 10-Python-Optimal-Stopping.zip, 10b-Gambling.pdf |
10 | 31/10 | DP over Infinite Time Horizon | [RLOC, Ch. 4] | 11-Infinite-DP.pdf |
11 | 06/11 | Component replacement, Minimum-time problem | - | 12-Component-Replacement.pdf, 12b-Spider-Fly.pdf |
12 | 07/11 | Towards Reinforcement Learning | [RLI, Ch. 1] | 13-DP-Summary.pdf, RLOC-Terminology.pdf |
13 | 13/11 | Intuitive Reinforcement Learning | [RLI, Ch. 2] | - |
14 | 14/11 | MDP and DP Revisited | [RLI, Ch. 3, 4] | - |
15 | 20/11 | Monte Carlo Methods I (on-policy) | [RLI, Ch. 5, Sec. 5.1-5.4] | - |
16 | 21/11 | Monte Carlo Methods II (off-policy). Example: Gambler's problem | [RLI, Ch. 5, Sec. 5.5-5.7, Ch. 4, Example 4.3] | 18-Python-MC.zip |
17 | 26/11 | Temporal Difference Methods: Prediction | [RLI, Ch. 6, Sec. 6.1-6.3] | - |
18 | 27/11 | Temporal Difference Methods: Sarsa, Q-Learning, Expected Sarsa | [RLI, Ch. 6, Sec. 6.4-6.6] | - |
19 | 28/11 | Temporal Difference Methods: Gambler's problem. Gymnasium: Frozen Lake environment | [RLI, Ch. 4, Example 4.3] | 21-Python-TD.zip |
20 | 04/12 | n-step Bootstrapping Methods: Prediction | [RLI, Ch. 7, Sec. 7.1] | 22-Random-walk-prediction.zip |
21 | 05/12 | n-step Bootstrapping Methods: Control | [RLI, Ch. 7, Sec. 7.2, 7.3, 7.5] | 23-Random-walk-control.zip |
22 | 10/12 | Approximate methods: Prediction and Control | [RLI, Ch. 9, Sec. 9.1-9.3, 9.7, Ch. 10, Sec. 10.1,10.2 ] | - |
23 | 11/12 | Policy gradient methods: REINFORCE | [RLI, Ch. 13, Sec. 13.1-13.3] | - |
24 | 12/12 | Policy gradient methods: Actor-Critic, Continuous actions | [RLI, Ch. 13, Sec. 13.5, 13.7] | - |
|