Dynamic Programming and Reinforcement Learning (A.Y. 2024/25)

Antonello Giannitrapani - DIISM - University of Siena

Lectures

Lecture	Date	Topics	References	Additional material (private)
01	02/10	Intro to the course	this website, [RLOC, Preface], [RLI, Ch. 1]	RLOC-Preface.pdf, RLI-Intro.pdf
02	03/10	Introduction to Markov Decision Processes - Review of Probability	[RLOC, Ch. 1] [PSES, Ch. 3-4]	02-MDP-Intro.pdf 03-Probability-Basics.pdf
03	9/10	Tools for the Simulation of Stochastic Systems	-	04-Simulation-Tools.pdf, 04b-Homework.pdf, 04b-Homework-Solution.zip
04	10/10	Intro to Python, Notebooks, Numpy	-	-
05	16/10	Basics of Discrete-Time Markov Chains	-	06-Markov-Chains-Basics.pdf, 06b-Homework.pdf
06	17/10	Formulation of Markov Decision Problems	[RLOC, Ch. 1]	07-MDP-Formulation.pdf
07	23/10	Deterministic Dynamic Programming	[RLOC, Ch. 1]	08-Deterministic-DP.pdf, 08b-HEMS.pdf, 08c-Shortest-Path.pdf
08	24/10	Stochastic Dynamic Programming	[RLOC, Ch. 1]	09-Stochastic-DP.pdf, 09b-IC.pdf, 09b-Python-IC.zip
09	30/10	Optimal Stopping Problem	-	10-Optimal-Stopping.pdf, 10-Python-Optimal-Stopping.zip, 10b-Gambling.pdf
10	31/10	DP over Infinite Time Horizon	[RLOC, Ch. 4]	11-Infinite-DP.pdf
11	06/11	Component replacement, Minimum-time problem	-	12-Component-Replacement.pdf, 12b-Spider-Fly.pdf
12	07/11	Towards Reinforcement Learning	[RLI, Ch. 1]	13-DP-Summary.pdf, RLOC-Terminology.pdf
13	13/11	Intuitive Reinforcement Learning	[RLI, Ch. 2]	-
14	14/11	MDP and DP Revisited	[RLI, Ch. 3, 4]	-
15	20/11	Monte Carlo Methods I (on-policy)	[RLI, Ch. 5, Sec. 5.1-5.4]	-
16	21/11	Monte Carlo Methods II (off-policy). Example: Gambler's problem	[RLI, Ch. 5, Sec. 5.5-5.7, Ch. 4, Example 4.3]	18-Python-MC.zip
17	26/11	Temporal Difference Methods: Prediction	[RLI, Ch. 6, Sec. 6.1-6.3]	-
18	27/11	Temporal Difference Methods: Sarsa, Q-Learning, Expected Sarsa	[RLI, Ch. 6, Sec. 6.4-6.6]	-
19	28/11	Temporal Difference Methods: Gambler's problem. Gymnasium: Frozen Lake environment	[RLI, Ch. 4, Example 4.3]	21-Python-TD.zip
20	04/12	n-step Bootstrapping Methods: Prediction	[RLI, Ch. 7, Sec. 7.1]	22-Random-walk-prediction.zip
21	05/12	n-step Bootstrapping Methods: Control	[RLI, Ch. 7, Sec. 7.2, 7.3, 7.5]	23-Random-walk-control.zip
22	10/12	Approximate methods: Prediction and Control	[RLI, Ch. 9, Sec. 9.1-9.3, 9.7, Ch. 10, Sec. 10.1,10.2 ]	-
23	11/12	Policy gradient methods: REINFORCE	[RLI, Ch. 13, Sec. 13.1-13.3]	-
24	12/12	Policy gradient methods: Actor-Critic, Continuous actions	[RLI, Ch. 13, Sec. 13.5, 13.7]	-