A 3‑ECTS hybrid course covering Reinforcement Learning from basics to state-of-the-art algorithms. Students learn to formulate and solve real-world problems using classical and modern techniques, from Q-learning to PPO and SAC, with practical focus on algorithm selection and reward learning for contemporary applications like LLM training.
Reinforcement Learning

Training details
Location
Campus Nord, Barcelona, Spain
Date
22/06/2026
Target Audiance
Student-Focused
Teaching language(s)
English / Catalan / Spanish
Organizing institution
Universitat Politècnica de Catalunya
Delivery mode
Hybrid
Level
Intermediate
Format
Case Study Session, Hands-on session, Lecture
Capacity or seats limit
30
Topics / Keywords
Reinforcement Learning
What You Will Learn
Students will understand the goals of Reinforcement Learning (RL) and the fundamental differences between RL and Supervised Learning. They will recognize the characteristics of problems where RL can be effectively applied and learn to formulate real-world problems as RL problems.
The course covers basic RL methods including Q-learning and its variants, providing a foundation for more advanced techniques. Students will learn off-policy Deep RL algorithms for discrete action spaces, specifically Deep Q-Networks (DQN). For continuous action spaces, the curriculum includes on-policy Deep RL algorithms such as REINFORCE, Actor-Critic, and Proximal Policy Optimization (PPO), as well as off-policy algorithms including Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC).
By the end of the course, students will be able to select appropriate algorithms for specific problem types and apply RL techniques in scenarios where the reward function is unknown or must be learned.
Learning objectives:
- Understand the goals of Reinforcement Learning (RL) and the differences with Supervises Learning
- Understand characteristics of problems where RL can be applied and shine
- Learn to formulate a problem as a RL problem
- Learn basic methods for RL: Q-learning and variants
- Learn off-policy Deep RL algorithms for discrete action spaces: DQN
- On-policy Deep RL algorithms for continuous action spaces: Reinforce, Actor Critic and PPO
- Off-policy Deep RL algorithms for continuous action spaces: DDPG, TD3 and SAC
- Learn which algorithm to apply for a given problem
- Learn to apply RL when reward function is unknown
Agenda
Agenda: 6 weeks
- Week 1 – Introduction to RL
- Week 2 – Basic RL algorithms
- Week 3 – Deep RL
- Week 4 – Actor Critic approaches
- Week 5 – Practical RL
- Week 6 – Learning the reward function
Instructor name(s)
Mario Martin
Instructor’s biography

Mario Martin is Associate Professor at the Polytechnic University of Catalonia (UPC). He is a member of the IDEAI-UPC Research Center and a visiting professor at the Barcelona Supercomputing Centre (BSC) since 2018. His specialization is in reinforcement learning, machine learning, and deep learning, topics where he has published more than 70 papers.
He is currently applying Multi-Agent RL techniques to marine animal tracking in collaboration with the Institute of Marine Sciences (Barcelona) and Monterey Bay Aquarium Research Institute. He is also applying RL to adaptive optics systems for ground-based Extremely Large Telescopes in collaboration with the Paris Observatory, the Australian National University, and the University of Hawaii, with technology currently deployed on the Subaru telescope in Hawaii. His recent work also includes applications of RL to robot learning.
Course Description
This course provides a comprehensive introduction to Reinforcement Learning (RL), covering both foundational concepts and state-of-the-art deep learning techniques. Students will begin by understanding the core principles of RL and how it differs fundamentally from supervised learning approaches, learning to identify problems where RL methods excel and how to properly formulate real-world challenges as RL problems.
The course progresses from classical tabular methods, including Q-learning and its variants, to modern deep reinforcement learning algorithms. Students will master off-policy methods for discrete action spaces through Deep Q-Networks (DQN), before advancing to continuous action space problems using both on-policy algorithms (REINFORCE, Actor-Critic, and Proximal Policy Optimization) and off-policy approaches (Deep Deterministic Policy Gradient, Twin Delayed DDPG, and Soft Actor-Critic).
Beyond algorithmic knowledge, the course emphasizes practical decision-making skills, teaching students how to select appropriate algorithms for specific problem characteristics. The course culminates with inverse reinforcement learning and reward learning, addressing scenarios where reward functions are unknown or difficult to specify (for instance in the training of Large Language Models).
Prerequisites
<p>Basic knowledge of Machine Learning and Deep Learning, Python and statistics/probability</p>
Certificate/badge details
<p>Certificate of Achievement</p>
Technical setup
<p>Laptop with internet access; access to open‑source tools; access to Google Colab.</p>
