This 3 ECTS course provides an introduction to deep learning methods for computer vision. It covers the fundamentals of digital images and visual tasks, followed by convolutional neural networks and their application to image classification, object detection, semantic segmentation, and instance segmentation. The course concludes with an outlook on attention-based models in computer vision.
Deep Learning for Computer Vision

Training details
Location
UPC North Campus
Start Date
07/09/2026
End Date
16/10/2026
Target Audiance
Student-Focused
Teaching language(s)
English / Catalan / Spanish
Organizing institution
Universitat Politècnica de Catalunya
Delivery mode
Hybrid
Level
Intermediate
Topics / Keywords
Computer Vision, Deep Learning, Convolutional Neural Networks, Image Classification, Object Detection, Semantic Segmentation, Instance Segmentation, Transfer Learning, Vision Transformers
What You Will Learn
Learning objectives:
The main objectives of this course are to:
- Understand the fundamentals of digital images and core computer vision tasks
- Understand the principles of deep learning and representation learning for visual data
- Design, train, and evaluate convolutional neural networks for image analysis
- Apply modern CNN architectures using transfer learning and fine-tuning
- Understand object detection, semantic segmentation, and instance segmentation pipelines
- Become familiar with emerging attention-based approaches in computer vision
Learning outcomes:
By the end of this course, participants will be able to:
- Explain how digital images are represented and processed by neural networks
- Distinguish between image classification, object detection, semantic segmentation, and instance segmentation
- Train and evaluate convolutional neural networks for image classification tasks
- Apply transfer learning using modern deep learning architectures
- Understand detection and segmentation pipelines and their evaluation metrics
- Recognize current trends beyond convolution-based models in computer vision
Agenda
Agenda:
Week 1. Introduction to Computer Vision and Digital Images
- What is computer vision?
- Digital image representation
- Core computer vision tasks: classification, detection, semantic segmentation, instance segmentation
- From handcrafted features to deep learning
Week 2. Deep Learning Foundations
- From machine learning to deep learning
- Neural networks and nonlinear representation learning
- Loss functions and optimization
- Overfitting, regularization, and generalization
- Practical training strategies
Week 3. Convolutional Neural Networks and Architectures
- Convolution, pooling, activation, and normalization layers
- CNN architectures
- Transfer learning and fine-tuning
- Evaluation metrics for image classification
Week 4. Object Detection
- Problem formulation
- Two-stage and one-stage detectors (e.g. Faster R-CNN, YOLO)
- Evaluation metrics
Week 5. Semantic and Instance Segmentation
- Problem formulation
- Fully convolutional networks and U-Net-based architectures
- From detection to instance-level segmentation
- Instance segmentation architectures (e.g. Mask R-CNN)
- Evaluation metrics
Week 6. Attention Mechanisms and Vision Transformers
- Limitations of convolution-based approaches
- Attention mechanisms: intuition and motivation
- Vision Transformers (ViT): high-level overview
- CNNs vs Vision Transformers: strengths and limitations
Instructor name(s)
- Verónica Vilaplana
- Maria Ysern
- Sigrid Vila
Instructor's biography
Verónica Vilaplana is an Associate Professor at the Universitat Politècnica de Catalunya (UPC) and a member of the Intelligent Data Science and Artificial Intelligence Research Center (IDEAI). Her research focuses on deep learning and computer vision, with a strong emphasis on image segmentation, representation learning, and medical image analysis. She has extensive experience leading and participating in national and European research projects, as well as in teaching and supervising undergraduate, master’s, and doctoral students.
Course Description
This course is designed to provide a solid and self-contained foundation in deep learning for computer vision. Starting from the representation of digital images and the definition of core visual tasks, the course introduces the principles of deep learning and convolutional neural networks as the dominant paradigm for visual understanding.
Participants will learn how modern CNN architectures are designed, trained, and evaluated, and how they are applied to key computer vision problems such as image classification, object detection, semantic segmentation, and instance segmentation. Practical considerations such as transfer learning, data augmentation, and evaluation metrics are emphasized throughout the course.
The course concludes with an overview of attention-based models and Vision Transformers, providing context on current trends in computer vision without assuming prior exposure to transformer architectures.
Prerequisites
Prerequisites:
- Basic programming skills in Python
- Introductory knowledge of machine learning
Certificate/badge details
Certificate of Achievement
Technical setup
Computer with internet access; access to open‑source tools; access to Google Colab.

