Deep Learning for Computer Vision

Training details

Location

Online

Start Date

14/09/2026

End Date

23/10/2026

Target Audiance

Student-Focused

Teaching language(s)

English / Catalan / Spanish

Organizing institution

Universitat Politècnica de Catalunya

Delivery mode

Online

Level

Intermediate

Format

Case Study Session, Self-paced Module

Topics / Keywords

Computer Vision, Deep Learning, Convolutional Neural Networks, Image Classification, Object Detection, Semantic Segmentation, Instance Segmentation, Transfer Learning, Vision Transformers

This 3 ECTS course provides an introduction to deep learning methods for computer vision. It covers the fundamentals of digital images and visual tasks, followed by convolutional neural networks and their application to image classification, object detection, semantic segmentation, and instance segmentation. The course concludes with an outlook on attention-based models in computer vision.

What You Will Learn

Learning objectives:

The main objectives of this course are to:

Understand the fundamentals of digital images and core computer vision tasks
Understand the principles of deep learning and representation learning for visual data
Design, train, and evaluate convolutional neural networks for image analysis
Apply modern CNN architectures using transfer learning and fine-tuning
Understand object detection, semantic segmentation, and instance segmentation pipelines
Become familiar with emerging attention-based approaches in computer vision

Learning outcomes:

By the end of this course, participants will be able to:

Explain how digital images are represented and processed by neural networks
Distinguish between image classification, object detection, semantic segmentation, and instance segmentation
Train and evaluate convolutional neural networks for image classification tasks
Apply transfer learning using modern deep learning architectures
Understand detection and segmentation pipelines and their evaluation metrics
Recognize current trends beyond convolution-based models in computer vision

Agenda

Agenda:

Week 1. Introduction to Computer Vision and Digital Images

What is computer vision?
Digital image representation
Core computer vision tasks: classification, detection, semantic segmentation, instance segmentation
From handcrafted features to deep learning

Week 2. Deep Learning Foundations

From machine learning to deep learning
Neural networks and nonlinear representation learning
Loss functions and optimization
Overfitting, regularization, and generalization
Practical training strategies

Week 3. Convolutional Neural Networks and Architectures

Convolution, pooling, activation, and normalization layers
CNN architectures
Transfer learning and fine-tuning
Evaluation metrics for image classification

Week 4. Object Detection

Problem formulation
Two-stage and one-stage detectors (e.g. Faster R-CNN, YOLO)
Evaluation metrics

Week 5. Semantic and Instance Segmentation

Problem formulation
Fully convolutional networks and U-Net-based architectures
From detection to instance-level segmentation
Instance segmentation architectures (e.g. Mask R-CNN)
Evaluation metrics

Week 6. Attention Mechanisms and Vision Transformers

Limitations of convolution-based approaches
Attention mechanisms: intuition and motivation
Vision Transformers (ViT): high-level overview
CNNs vs Vision Transformers: strengths and limitations

Instructor name(s)

Verónica Vilaplana
Maria Ysern
Sigrid Vila

Instructor's biography

Verónica Vilaplana is an Associate Professor at the Universitat Politècnica de Catalunya (UPC) and a member of the Intelligent Data Science and Artificial Intelligence Research Center (IDEAI). Her research focuses on deep learning and computer vision, with a strong emphasis on image segmentation, representation learning, and medical image analysis. She has extensive experience leading and participating in national and European research projects, as well as in teaching and supervising undergraduate, master’s, and doctoral students.

Course Description

This course is designed to provide a solid and self-contained foundation in deep learning for computer vision. Starting from the representation of digital images and the definition of core visual tasks, the course introduces the principles of deep learning and convolutional neural networks as the dominant paradigm for visual understanding.

Participants will learn how modern CNN architectures are designed, trained, and evaluated, and how they are applied to key computer vision problems such as image classification, object detection, semantic segmentation, and instance segmentation. Practical considerations such as transfer learning, data augmentation, and evaluation metrics are emphasized throughout the course.

The course concludes with an overview of attention-based models and Vision Transformers, providing context on current trends in computer vision without assuming prior exposure to transformer architectures.

Prerequisites

Prerequisites:

Basic programming skills in Python
Introductory knowledge of machine learning

Certificate/badge details

Certificate of Achievement

Technical setup

Computer with internet access; access to open‑source tools; access to Google Colab.