ST455 Reinforcement Learning

ST455 Half Unit
Reinforcement Learning

This information is for the 2022/23 session.

Teacher responsible

Mr Chengchun Shi COL5.11

Availability

This course is available on the MSc in Applicable Mathematics, MSc in Applied Social Data Science, MSc in Data Science, MSc in Geographic Data Science, MSc in Health Data Science, MSc in Management of Information Systems and Digital Innovation, MSc in Operations Research & Analytics, MSc in Quantitative Methods for Risk Management, MSc in Statistics, MSc in Statistics (Financial Statistics), MSc in Statistics (Financial Statistics) (杏吧论坛 and Fudan), MSc in Statistics (Financial Statistics) (Research), MSc in Statistics (Research), MSc in Statistics (Social Statistics) and MSc in Statistics (Social Statistics) (Research). This course is available with permission as an outside option to students on other programmes where regulations permit.

MSc Data Science students will be given priority for enrollment in this course.

Pre-requisites

The course requires some mathematics, in particular some use of vectors and some calculus. Basic knowledge of computer programming is expected. Knowledge of Python is useful.

Course content

This course is about reinforcement learning, covering the fundamental concepts of reinforcement learning framework and solution methods. The focus is on the underlying methodology as well as practical implementation and evaluation using software code. The course will cover the following topics:

Introduction – course overview
Foundations of reinforcement learning – Markov decision process, Bellman optimality equation, the existence of optimal stationary policy
Dynamic programing and Monte Carlo methods – policy evaluation, policy improvement, policy iteration, value iteration based on dynamic programming, and Monte Carlo methods for reinforcement learning, including Monte Carlo estimation and Monte Carlo control
Temporal difference learning – temporal difference learning, temporal difference prediction, Sarsa, Q-learning and n-step temporal difference predictions, TD(lambda).
On-policy prediction and control with approximation – types of function approximators (value and action-value function approximator), gradient based methods for value function prediction, convergence guarantees with linear function approximator, and semi-gradient n-step Sarsa
Q-learning type algorithms with function approximation – q-learning with linear function approximator, fitted q-iteration, deep q-network, double deep q-learning, convergence analysis
Policy gradient methods – policy approximation, REINFORCE, actor-critic methods that combine policy function approximation with action-value function approximation
Trust-region policy optimization – monotonic improvement guarantee, trust-region policy optimization
Batch off-policy evaluation – importance sampling-based method, doubly robust method, marginalized importance sampling, double reinforcement learning
Batch policy optimisation – recent advances in offline reinforcement learning algorithms

Teaching

20 hours of lectures and 15 hours of classes in the LT.

This course will be delivered through a combination of classes and lectures totalling a minimum of 35 hours in Lent Term. This course includes a reading week in Week 6 of Lent Term.

Formative coursework

Students will be expected to produce 8 problem sets in the LT.

Indicative reading

Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons. https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316887
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. http://incompleteideas.net/book/RLbook2020.pdf
OpenAI Gym, https://gym.openai.com/

Assessment

Project (80%), continuous assessment (10%) and continuous assessment (10%) in the LT.

Two of the problem sets submitted by students weekly will be assessed (20% in total). Each problem set will have an individual mark of 10% and submission will be required in LT Weeks 4 and 7. In addition, there will be a take-home exam (80%) in the form of a group project in which they will demonstrate the ability to apply and evaluate different reinforcement learning algorithms.

Key facts

Department: Statistics

Total students 2021/22: 26

Average class size 2021/22: 12

Controlled access 2021/22: Yes

Lecture capture used 2021/22: Yes (LT)

Value: Half Unit

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

Self-management
Problem solving
Application of information skills
Communication
Application of numeracy skills
Commercial awareness
Specialist skills

杏吧论坛

ST455 Half UnitReinforcement Learning

ST455 Half Unit
Reinforcement Learning