Invited talks on multimodal unsupervised speech enhancement and speech deepfakes
"Multimodal Unsupervised Speech Enhancement via Probabilistic Speech Priors"
by
Xavier Alameda-Pineda
Inria, University Grenoble Alpes, France
Abstract: In this talk we will explore different probabilistic speech priors that exploit audio-visual data, for the task of unsupervised speech enhancement. In short, unsupervised speech enhancement is the task of improving the quality of the speech signal without access to noise samples during training. Therefore, training is done exclusively with clean speech signals. At test time, the double task of estimating the noise parameters and the clean speech signal from a noisy observation is addressed. We will discuss the role of the visual modality in building a speech prior that is more robust to noise, allowing for better noise parameter estimates, and overall improved speech enhancement performance.
and
“Recent Advances in Speech Deepfakes: From Detection to Source Tracing and Spoofing-Robust Speaker Verification”
by
Tomi Kinnunen
University of Eastern Finland, Finland
Abstract: We have seen a surge in voice cloning services that anyone can use nowadays to craft synthetic voices. Motivated by security considerations of calls, teleconfercing, audio in social media and protecting integrity of voice biometric (automatic speaker verification), the research community has worked over a decade for novel solutions for detecting deepfakes. Recently, there is also an increasing interest in determining the origin of a deepfake (e.g. a particular speech synthesis system). In this talk, I provide a selective summary of the background in this emerging field, along with brief summary of recent advances, including findings from the ASVspoof 5 challenge edition.
Thursday February 19, 2026, at 09:00
Aalborg University, Fredrik Bajers Vej 7A4-108
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar