BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
DTSTART:20250309T030000
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:PDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20251102T010000
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:PST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20251016T014814Z
UID:A0F6F567-AE3C-4924-92DC-4FF0ED434F88
DTSTART;TZID=America/Los_Angeles:20251015T130000
DTEND;TZID=America/Los_Angeles:20251015T140000
DESCRIPTION:Abstract:\n\nState-of-the-art ASR systems excel on close-talk b
 enchmarks but struggle with far-field conversational speech\, where error 
 rates remain above 20%. Current benchmark datasets inadequately assess gen
 eralization across domains and real-world conditions\, often relying on or
 acle segmentation that yields overly optimistic results. Distant ASR (DASR
 ) faces unique challenges including overlapping speech\, varied recording 
 setups\, and dynamic speaker interactions that significantly complicate sy
 stem development. Despite these difficulties\, spontaneous conversational 
 speech represents the next frontier for developing more human-like AI agen
 ts capable of natural multi-party communication. This talk presents recent
  advances in DASR through three interconnected efforts: (1) the CHiME-7 an
 d CHiME-8 DASR challenges\, which established rigorous benchmarks for gene
 ralizable robust meeting transcription\, (2) end-to-end joint modeling tha
 t unifies speaker diarization and speech recognition into a single framewo
 rk\, moving beyond traditional pipeline approaches\, and (3) synthetic dat
 a generation leveraging large language models and text-to-speech systems t
 o create realistic multi-speaker training data at scale.\n\nSpeaker(s): Sa
 muele \, \n\nVirtual: https://events.vtools.ieee.org/m/505533
LOCATION:Virtual: https://events.vtools.ieee.org/m/505533
ORGANIZER:sunit.sivasankaran@gmail.com
SEQUENCE:8
SUMMARY:Distant conversational speech recognition: Challenges and Opportuni
 ties
URL;VALUE=URI:https://events.vtools.ieee.org/m/505533
X-ALT-DESC:Description: &lt;br /&gt;&lt;div&gt;&lt;strong&gt;&lt;u data-olk-copy-source=&quot;Calenda
 rCompose&quot;&gt;Abstract&lt;/u&gt;&lt;/strong&gt;:&lt;/div&gt;\n&lt;div&gt;&amp;nbsp\;&lt;/div&gt;\n&lt;div&gt;State-of-
 the-art ASR systems excel on close-talk benchmarks but struggle with far-f
 ield conversational speech\, where error rates remain above 20%. Current b
 enchmark datasets inadequately assess generalization across domains and re
 al-world conditions\, often relying on oracle segmentation that yields ove
 rly optimistic results. Distant ASR (DASR) faces unique challenges includi
 ng overlapping speech\, varied recording setups\, and dynamic speaker inte
 ractions that significantly complicate system development. Despite these d
 ifficulties\, spontaneous conversational speech represents the next fronti
 er for developing more human-like AI agents capable of natural multi-party
  communication.&amp;nbsp\; This talk presents recent advances in DASR through 
 three interconnected efforts: (1) the CHiME-7 and CHiME-8 DASR challenges\
 , which established rigorous benchmarks for generalizable robust meeting t
 ranscription\, (2) end-to-end joint modeling that unifies speaker diarizat
 ion and speech recognition into a single framework\, moving beyond traditi
 onal pipeline approaches\, and (3) synthetic data generation leveraging la
 rge language models and text-to-speech systems to create realistic multi-s
 peaker training data at scale.&lt;/div&gt;
END:VEVENT
END:VCALENDAR

