Reinforcement Learning

Training details

Location

Campus Nord, Barcelona, Spain

Date

22/06/2026

Target Audiance

Student-Focused

Teaching language(s)

English / Catalan / Spanish

Organizing institution

Universitat Politècnica de Catalunya

Delivery mode

Hybrid

Level

Intermediate

Format

Case Study Session, Hands-on session, Lecture

Capacity or seats limit

Topics / Keywords

Reinforcement Learning

A 3‑ECTS hybrid course covering Reinforcement Learning from basics to state-of-the-art algorithms. Students learn to formulate and solve real-world problems using classical and modern techniques, from Q-learning to PPO and SAC, with practical focus on algorithm selection and reward learning for contemporary applications like LLM training.

What You Will Learn

Students will understand the goals of Reinforcement Learning (RL) and the fundamental differences between RL and Supervised Learning. They will recognize the characteristics of problems where RL can be effectively applied and learn to formulate real-world problems as RL problems.

The course covers basic RL methods including Q-learning and its variants, providing a foundation for more advanced techniques. Students will learn off-policy Deep RL algorithms for discrete action spaces, specifically Deep Q-Networks (DQN). For continuous action spaces, the curriculum includes on-policy Deep RL algorithms such as REINFORCE, Actor-Critic, and Proximal Policy Optimization (PPO), as well as off-policy algorithms including Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC).

By the end of the course, students will be able to select appropriate algorithms for specific problem types and apply RL techniques in scenarios where the reward function is unknown or must be learned.

Learning objectives:

Understand the goals of Reinforcement Learning (RL) and the differences with Supervises Learning
Understand characteristics of problems where RL can be applied and shine
Learn to formulate a problem as a RL problem
Learn basic methods for RL: Q-learning and variants
Learn off-policy Deep RL algorithms for discrete action spaces: DQN
On-policy Deep RL algorithms for continuous action spaces: Reinforce, Actor Critic and PPO
Off-policy Deep RL algorithms for continuous action spaces: DDPG, TD3 and SAC
Learn which algorithm to apply for a given problem
Learn to apply RL when reward function is unknown

Agenda

Agenda: 6 weeks

Week 1 – Introduction to RL
Week 2 – Basic RL algorithms
Week 3 – Deep RL
Week 4 – Actor Critic approaches
Week 5 – Practical RL
Week 6 – Learning the reward function

Instructor name(s)

Mario Martin

Instructor’s biography

Mario Martin is Associate Professor at the Polytechnic University of Catalonia (UPC). He is a member of the IDEAI-UPC Research Center and a visiting professor at the Barcelona Supercomputing Centre (BSC) since 2018. His specialization is in reinforcement learning, machine learning, and deep learning, topics where he has published more than 70 papers.

He is currently applying Multi-Agent RL techniques to marine animal tracking in collaboration with the Institute of Marine Sciences (Barcelona) and Monterey Bay Aquarium Research Institute. He is also applying RL to adaptive optics systems for ground-based Extremely Large Telescopes in collaboration with the Paris Observatory, the Australian National University, and the University of Hawaii, with technology currently deployed on the Subaru telescope in Hawaii. His recent work also includes applications of RL to robot learning.

Course Description

This course provides a comprehensive introduction to Reinforcement Learning (RL), covering both foundational concepts and state-of-the-art deep learning techniques. Students will begin by understanding the core principles of RL and how it differs fundamentally from supervised learning approaches, learning to identify problems where RL methods excel and how to properly formulate real-world challenges as RL problems.

The course progresses from classical tabular methods, including Q-learning and its variants, to modern deep reinforcement learning algorithms. Students will master off-policy methods for discrete action spaces through Deep Q-Networks (DQN), before advancing to continuous action space problems using both on-policy algorithms (REINFORCE, Actor-Critic, and Proximal Policy Optimization) and off-policy approaches (Deep Deterministic Policy Gradient, Twin Delayed DDPG, and Soft Actor-Critic).

Beyond algorithmic knowledge, the course emphasizes practical decision-making skills, teaching students how to select appropriate algorithms for specific problem characteristics. The course culminates with inverse reinforcement learning and reward learning, addressing scenarios where reward functions are unknown or difficult to specify (for instance in the training of Large Language Models).

Prerequisites

Basic knowledge of Machine Learning and Deep Learning, Python and statistics/probability

Certificate/badge details

Certificate of Achievement

Technical setup

Laptop with internet access; access to open‑source tools; access to Google Colab.