Foundational Speech Models and their Efficient Training with NVIDIA NeMo {AI Talks with Coffee/Tea #30} : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Foundational Speech Models and their Efficient Training with NVIDIA NeMo {AI Talks with Coffee/Tea #30}

#WIE #current #speech #tools #webinars #training #technology #engineering #software #society #computer #architecture

https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025

The intersection of speech and language models offer unique opportunities and challenges. This talk provides a comprehensive walkthrough of speech-language model research from NVIDIA NeMo. We cover several types of models such as attention-encoder-decoder Canary-1B, and LLM-based architectures such as SALM or BESTOW. In particular, we highlight the challenges in training and inference efficiency of such models and propose robust solutions via 2D bucketing and batch size OOMptimizer. Finally, we highlight the difficulty of preserving text-domain capabilities in speech-augmented training and present several possible solutions: EMMeTT, VoiceTextBlender, and Canary-Qwen-2.5B.

About the Presenter:

Piotr Żelasko received the B.S. and M.Sc. degrees in acoustic engineering, and the Ph.D. in electronic engineering from AGH-University Krakow, Poland in 2013, 2014, and 2019 respectively.

He is currently a research scientist at NVIDIA NeMo building multitask and multimodal models and efficient training infrastructure. He held a research scientist position at JHU’s CLSP and developed speech technology at different companies (Techmo, Avaya, Meaning.Team).

Dr. Żelasko is a co-author of the next-generation Kaldi toolkit (k2) and the maintainer of Lhotse.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

Loading virtual attendance info...

Contact Event Hosts

Starts 31 July 2025 04:00 AM UTC
Ends 27 August 2025 04:00 AM UTC
No Admission Charge

Agenda

https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025

Please register here too and on Vtools too.

https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025