The number of AI for science use cases is rising and more and more scientists and engineers are training their own deep learning models. This increases the demand for HPC resources and cluster computers, as training better models require more, and distributed resources, either to reduce the training time, or to fit larger models. But are our training workloads performant? Are we efficiently using our resources? A one-solution-for-all is not a good strategy when the datasets, the model architectures, and the AI libraries are so different and tailored for each case, and new solutions appear every month.
The purpose of this training is to gain the skills and knowledge to understand how the training of a deep neural network uses the resources in our computer. The training will guide the students into getting insight of the execution using HPC performance tools, and by making informed decisions, apply optimizations and use more advanced libraries.


Marc Clascà is a research engineer at the Barcelona Supercomputing Center since 2020. His research interests include programming models, performance tools, performance analysis and specialized hardware and accelerators. He works in HPC parallel performance analysis at the Best Practices for Performance and Programmability (BePPP) group, aiming to provide the scientific community with the best practices in programming portable and performant codes. His current research focus is on analyzing the performance of scientific applications and AI workloads that use GPUs, with the aim of deriving new efficiency metrics and analysis methodologies. This includes exploring the potentials of GPU specific tracing and visualization tools, and understanding new programming models and communication patterns used in LLM training and inference.