Distant conversational speech recognition: Challenges and Opportunities : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Distant conversational speech recognition: Challenges and Opportunities

#SP

Abstract:

State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation that yields overly optimistic results. Distant ASR (DASR) faces unique challenges including overlapping speech, varied recording setups, and dynamic speaker interactions that significantly complicate system development. Despite these difficulties, spontaneous conversational speech represents the next frontier for developing more human-like AI agents capable of natural multi-party communication. This talk presents recent advances in DASR through three interconnected efforts: (1) the CHiME-7 and CHiME-8 DASR challenges, which established rigorous benchmarks for generalizable robust meeting transcription, (2) end-to-end joint modeling that unifies speaker diarization and speech recognition into a single framework, moving beyond traditional pipeline approaches, and (3) synthetic data generation leveraging large language models and text-to-speech systems to create realistic multi-speaker training data at scale.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

Loading virtual attendance info...

Contact Event Hosts

Starts 07 October 2025 07:00 AM UTC
Ends 15 October 2025 07:00 AM UTC
No Admission Charge

Speakers

Samuele of Carnegie Mellon University

Topic:

Distant conversational speech recognition: Challenges and Opportunities

Abstract:

Biography:

Samuele Cornell is currently a postdoctoral research associate at Carnegie Mellon University at the Language Technologies Institute within Prof. Shinji Watanabe research group (WAVLab). He got a Master degree in electronic engineering (summa cum laude) at Università Politecnica delle Marche in 2019 and, in 2023, at the same institution, a doctoral degree in Information Engineering. His research interests are mainly in the area of robust speech processing (speech enhancement, speech separation, diarization, automatic speech recognition) for distant multi-talker conversational scenarios, and also in the broader field of machine listening (sound event detection and classification) with over 50 publications in these fields.
He is also author and has significant contributions in several popular open-source speech-processing toolkits (e.g. SpeechBrain, ESPNet, Asteroid source separation) and has organized and co-organized popular audio processing challenges in the fields of sound event detection, robust speech processing and speech enhancement such as DCASE Task 4 (2022, 2021, 2024), CHiME (CHiME-7/8 DASR lead organizer) and URGENT (2024 and 2025) and, more recently, co-led the JSALT 2025 EMMA team for end-to-end multi-channel multi-talker ASR.

Email: