BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
DTSTART:20250309T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20251102T010000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250830T233640Z
UID:A1BFEBEC-94C6-4823-A8A8-65DBD8F56390
DTSTART;TZID=America/New_York:20250827T090000
DTEND;TZID=America/New_York:20250827T103000
DESCRIPTION:https://landing.signalprocessingsociety.org/ieee-sps-webinars-2
 7-aug-2025\n\nThe intersection of speech and language models offer unique 
 opportunities and challenges. This talk provides a comprehensive walkthrou
 gh of speech-language model research from NVIDIA NeMo. We cover several ty
 pes of models such as attention-encoder-decoder Canary-1B\, and LLM-based 
 architectures such as SALM or BESTOW. In particular\, we highlight the cha
 llenges in training and inference efficiency of such models and propose ro
 bust solutions via 2D bucketing and batch size OOMptimizer. Finally\, we h
 ighlight the difficulty of preserving text-domain capabilities in speech-a
 ugmented training and present several possible solutions: EMMeTT\, VoiceTe
 xtBlender\, and Canary-Qwen-2.5B.\n\nAbout the Presenter:\n\nPiotr Żelask
 o received the B.S. and M.Sc. degrees in acoustic engineering\, and the Ph
 .D. in electronic engineering from AGH-University Krakow\, Poland in 2013\
 , 2014\, and 2019 respectively.\n\nHe is currently a research scientist at
  NVIDIA NeMo building multitask and multimodal models and efficient traini
 ng infrastructure. He held a research scientist position at JHU’s CLSP a
 nd developed speech technology at different companies (Techmo\, Avaya\, Me
 aning.Team).\n\nDr. Żelasko is a co-author of the next-generation Kaldi t
 oolkit (k2) and the maintainer of Lhotse.\n\nAgenda: \nhttps://landing.sig
 nalprocessingsociety.org/ieee-sps-webinars-27-aug-2025\n\nPlease register 
 here too and on Vtools too.\n\nVirtual: https://events.vtools.ieee.org/m/4
 95161
LOCATION:Virtual: https://events.vtools.ieee.org/m/495161
ORGANIZER:baris.kazar@oracle.com
SEQUENCE:13
SUMMARY:Foundational Speech Models and their Efficient Training with NVIDIA
  NeMo {AI Talks with Coffee/Tea #30}
URL;VALUE=URI:https://events.vtools.ieee.org/m/495161
X-ALT-DESC:Description: &lt;br /&gt;&lt;p&gt;&lt;a href=&quot;https://landing.signalprocessings
 ociety.org/ieee-sps-webinars-27-aug-2025&quot;&gt;https://landing.signalprocessing
 society.org/ieee-sps-webinars-27-aug-2025&lt;/a&gt;&lt;/p&gt;\n&lt;div class=&quot;row-fluid-w
 rapper row-depth-1 row-number-6 dnd-row&quot;&gt;\n&lt;div class=&quot;row-fluid &quot;&gt;\n&lt;div 
 class=&quot;span12 widget-span widget-type-custom_widget dnd-module&quot; data-widge
 t-type=&quot;custom_widget&quot; data-x=&quot;0&quot; data-w=&quot;12&quot;&gt;\n&lt;div id=&quot;hs_cos_wrapper_wi
 dget_1684529549375&quot; class=&quot;hs_cos_wrapper hs_cos_wrapper_widget hs_cos_wra
 pper_type_module widget-type-rich_text&quot; data-hs-cos-general-type=&quot;widget&quot; 
 data-hs-cos-type=&quot;module&quot;&gt;\n&lt;p&gt;The intersection of speech and language mod
 els offer unique opportunities and challenges. This talk provides a compre
 hensive walkthrough of speech-language model research from NVIDIA NeMo. We
  cover several types of models such as attention-encoder-decoder Canary-1B
 \, and LLM-based architectures such as SALM or BESTOW. In particular\, we 
 highlight the challenges in training and inference efficiency of such mode
 ls and propose robust solutions via 2D bucketing and batch size OOMptimize
 r. Finally\, we highlight the difficulty of preserving text-domain capabil
 ities in speech-augmented training and present several possible solutions:
  EMMeTT\, VoiceTextBlender\, and Canary-Qwen-2.5B.&lt;/p&gt;\n&lt;span id=&quot;hs_cos_w
 rapper_widget_1684529549375_&quot; class=&quot;hs_cos_wrapper hs_cos_wrapper_widget 
 hs_cos_wrapper_type_rich_text&quot; data-hs-cos-general-type=&quot;widget&quot; data-hs-c
 os-type=&quot;rich_text&quot;&gt;&lt;/span&gt;&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;div class=&quot;row
 -fluid-wrapper row-depth-1 row-number-7 dnd-row&quot;&gt;\n&lt;div class=&quot;row-fluid &quot;
 &gt;\n&lt;div class=&quot;span12 widget-span widget-type-custom_widget dnd-module&quot; da
 ta-widget-type=&quot;custom_widget&quot; data-x=&quot;0&quot; data-w=&quot;12&quot;&gt;\n&lt;div id=&quot;hs_cos_wr
 apper_module_16896111543583&quot; class=&quot;hs_cos_wrapper hs_cos_wrapper_widget h
 s_cos_wrapper_type_module widget-type-rich_text&quot; data-hs-cos-general-type=
 &quot;widget&quot; data-hs-cos-type=&quot;module&quot;&gt;\n&lt;h2&gt;About the Presenter:&lt;/h2&gt;\n&lt;p&gt;Pio
 tr Żelasko received the B.S. and M.Sc. degrees in acoustic engineering\, 
 and the Ph.D. in electronic engineering from AGH-University Krakow\, Polan
 d in 2013\, 2014\, and 2019 respectively.&lt;br&gt;&lt;br&gt;He is currently a researc
 h scientist at NVIDIA NeMo building multitask and multimodal models and ef
 ficient training infrastructure. He held a research scientist position at 
 JHU&amp;rsquo\;s CLSP and developed speech technology at different companies (
 Techmo\, Avaya\, Meaning.Team).&lt;br&gt;&lt;br&gt;Dr. Żelasko is a co-author of the 
 next-generation Kaldi toolkit (k2) and the maintainer of Lhotse.&lt;/p&gt;\n&lt;spa
 n id=&quot;hs_cos_wrapper_module_16896111543583_&quot; class=&quot;hs_cos_wrapper hs_cos_
 wrapper_widget hs_cos_wrapper_type_rich_text&quot; data-hs-cos-general-type=&quot;wi
 dget&quot; data-hs-cos-type=&quot;rich_text&quot;&gt;&lt;/span&gt;&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;/div&gt;&lt;b
 r /&gt;&lt;br /&gt;Agenda: &lt;br /&gt;&lt;p&gt;&lt;a href=&quot;https://landing.signalprocessingsociet
 y.org/ieee-sps-webinars-27-aug-2025&quot;&gt;https://landing.signalprocessingsocie
 ty.org/ieee-sps-webinars-27-aug-2025&lt;/a&gt;&lt;/p&gt;\n&lt;p&gt;Please register here too 
 and on Vtools too.&lt;/p&gt;
END:VEVENT
END:VCALENDAR