Deep Learning for Computer Vision

Training details

Location

UPC North Campus

Start Date

07/09/2026

End Date

16/10/2026

Target Audiance

Student-Focused

Teaching language(s)

English / Catalan / Spanish

Organizing institution

Universitat Politècnica de Catalunya

Delivery mode

Hybrid

Level

Intermediate

Topics / Keywords

Computer Vision, Deep Learning, Convolutional Neural Networks, Image Classification, Object Detection, Semantic Segmentation, Instance Segmentation, Transfer Learning, Vision Transformers

This 3 ECTS course provides an introduction to deep learning methods for computer vision. It covers the fundamentals of digital images and visual tasks, followed by convolutional neural networks and their application to image classification, object detection, semantic segmentation, and instance segmentation. The course concludes with an outlook on attention-based models in computer vision.

What You Will Learn

Learning objectives:

The main objectives of this course are to:

  • Understand the fundamentals of digital images and core computer vision tasks
  • Understand the principles of deep learning and representation learning for visual data
  • Design, train, and evaluate convolutional neural networks for image analysis
  • Apply modern CNN architectures using transfer learning and fine-tuning
  • Understand object detection, semantic segmentation, and instance segmentation pipelines
  • Become familiar with emerging attention-based approaches in computer vision

Learning outcomes:

By the end of this course, participants will be able to:

  • Explain how digital images are represented and processed by neural networks
  • Distinguish between image classification, object detection, semantic segmentation, and instance segmentation
  • Train and evaluate convolutional neural networks for image classification tasks
  • Apply transfer learning using modern deep learning architectures
  • Understand detection and segmentation pipelines and their evaluation metrics
  • Recognize current trends beyond convolution-based models in computer vision

Agenda

Agenda:

Week 1. Introduction to Computer Vision and Digital Images

  • What is computer vision?
  • Digital image representation
  • Core computer vision tasks: classification, detection, semantic segmentation, instance segmentation
  • From handcrafted features to deep learning

Week 2. Deep Learning Foundations

  • From machine learning to deep learning
  • Neural networks and nonlinear representation learning
  • Loss functions and optimization
  • Overfitting, regularization, and generalization
  • Practical training strategies

Week 3. Convolutional Neural Networks and Architectures

  • Convolution, pooling, activation, and normalization layers
  • CNN architectures
  • Transfer learning and fine-tuning
  • Evaluation metrics for image classification

Week 4. Object Detection

  • Problem formulation
  • Two-stage and one-stage detectors (e.g. Faster R-CNN, YOLO)
  • Evaluation metrics

Week 5. Semantic and Instance Segmentation

  • Problem formulation
  • Fully convolutional networks and U-Net-based architectures
  • From detection to instance-level segmentation
  • Instance segmentation architectures (e.g. Mask R-CNN)
  • Evaluation metrics

Week 6. Attention Mechanisms and Vision Transformers

  • Limitations of convolution-based approaches
  • Attention mechanisms: intuition and motivation
  • Vision Transformers (ViT): high-level overview
  • CNNs vs Vision Transformers: strengths and limitations

Instructor name(s)

  • Verónica Vilaplana
  • Maria Ysern
  • Sigrid Vila

Instructor's biography

Verónica Vilaplana is an Associate Professor at the Universitat Politècnica de Catalunya (UPC) and a member of the Intelligent Data Science and Artificial Intelligence Research Center (IDEAI). Her research focuses on deep learning and computer vision, with a strong emphasis on image segmentation, representation learning, and medical image analysis. She has extensive experience leading and participating in national and European research projects, as well as in teaching and supervising undergraduate, master’s, and doctoral students.

Course Description

This course is designed to provide a solid and self-contained foundation in deep learning for computer vision. Starting from the representation of digital images and the definition of core visual tasks, the course introduces the principles of deep learning and convolutional neural networks as the dominant paradigm for visual understanding.

Participants will learn how modern CNN architectures are designed, trained, and evaluated, and how they are applied to key computer vision problems such as image classification, object detection, semantic segmentation, and instance segmentation. Practical considerations such as transfer learning, data augmentation, and evaluation metrics are emphasized throughout the course.

The course concludes with an overview of attention-based models and Vision Transformers, providing context on current trends in computer vision without assuming prior exposure to transformer architectures.

Prerequisites

Prerequisites:

  • Basic programming skills in Python
  • Introductory knowledge of machine learning

Certificate/badge details

Certificate of Achievement

Technical setup

Computer with internet access; access to open‑source tools; access to Google Colab.