This 3 ECTS course provides a comprehensive overview of modern neural models for natural language processing, speech, and multimodal AI. Starting from foundational concepts such as word embeddings and recurrent neural networks, the course progresses through attention mechanisms and transformer architectures, culminating in large language models, self-supervised speech representations, and multimodal systems.
Through a combination of theory, practical examples, and model analysis, participants will gain a solid understanding of how state-of-the-art NLP and speech systems are built, trained, and applied. The course emphasizes conceptual clarity, architectural understanding, and informed model selection rather than low-level implementation from scratch.
The course is delivered in a hybrid format and concludes with projects focused on analyzing or applying modern pretrained models to realistic tasks.
Neural models for language and speech have undergone a rapid evolution, culminating in large-scale pretrained and multimodal systems that underpin many modern AI applications. This course is designed to provide participants with a structured understanding of this evolution, focusing on the architectural and conceptual breakthroughs that enabled current state-of-the-art models.
Starting from classic word representations and recurrent neural networks, the course introduces attention mechanisms and transformer architectures, which form the backbone of modern NLP and speech systems. Participants will explore large language models, multilingual and pretrained approaches, and recent advances in speech representation learning, automatic speech recognition, and text-to-speech synthesis.
The final part of the course addresses multimodal models that integrate language, speech, and other modalities, highlighting current research directions and practical considerations. By the end of the course, participants will be equipped to understand, analyze, and responsibly apply modern neural language and speech models in research and industry contexts.