IEEE UK & Ireland Signal Processing Society Seminar "Large Language-Audio Models and Applications", Professor Wenwu Wang (Event SPS-2025-01) : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

IEEE UK & Ireland Signal Processing Society Seminar "Large Language-Audio Models and Applications", Professor Wenwu Wang (Event SPS-2025-01)

#webinar #sps #LLM

Please register for the event using the following link: Registration Link

The IEEE UK&I Signal Processing Society Chapter is kicking off a new series of events with an invited speaker seminar series, where talks will not only focus on a specific technical area, but will also highlight a portfolio of activities and opportunities in centres of excellence, research institutes, and research groups. The seminar will also give brief updates on future IEEE UK&I SPS Chapter events. We hope these events will contribute to refreshing and building a UK signal processing research map.

In this first talk, we are delighted to welcome Professor Wenwu Wang, a Professor in Signal Processing and Machine Learning, School of Computer Science and Electronic Engineering, University of Surrey, UK, who will talk about Large Language-Audio Models and their Applications. Professor Wang is a recognised world expert in the application of AI to audio processing, and his talk will cover the interface of statistical-signal processing techniques and the latest AI technologies.

Presentation abstract: Large Language Models (LLMs) are being explored in audio processing to interpret and generate meaningful patterns from complex sound data, such as speech, music, environmental noise, sound effects, and other non-verbal audio. Combined with acoustic models, LLMs offer great potential for addressing a variety of problems in audio processing, such as audio captioning, audio generation, source separation, and audio coding. This talk will cover recent advancements in using LLMs to address audio-related challenges. Topics will include the language-audio models for mapping and aligning audio with textual data, their applications across various audio tasks, the creation of language-audio datasets, and potential future directions in language-audio learning. We will demonstrate our recent works in this area, for example, AudioLDM, AudioLDM2 and WavJourney for audio generation and storytelling, AudioSep for audio source separation, ACTUAL for audio captioning, SemantiCodec for audio coding, WavCraft for content creation and editing, and APT-LLMs for audio reasoning, and the datasets WavCaps, Sound-VECaps, and AudioSetCaps for training and evaluating large language-audio models.

This will be a hybrid event, with both in-person and online attendance options. If you intend to participate in person, we kindly encourage you to register as soon as possible, as in-person capacity is limited. Light refreshments, including coffee and tea, will be available for in-person attendees.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

Loading virtual attendance info...

University of Surrey, Stag Hill Campus
Guildford, England
United Kingdom GU2 7XH
Building: Arthur C. Clarke Building (Block BA)
Room Number: CVSSP Seminar Room, Ground level, Room 35
Click here for Map

Contact Event Host

Link to External Registration

Speakers

Wenwu Wang

Topic:

Large Language-Audio Models and Applications

Large Language Models (LLMs) are being explored in audio processing to interpret and generate meaningful patterns from complex sound data, such as speech, music, environmental noise, sound effects, and other non-verbal audio. Combined with acoustic models, LLMs offer great potential for addressing a variety of problems in audio processing, such as audio captioning, audio generation, source separation, and audio coding. This talk will cover recent advancements in using LLMs to address audio-related challenges. Topics will include the language-audio models for mapping and aligning audio with textual data, their applications across various audio tasks, the creation of language-audio datasets, and potential future directions in language-audio learning. We will demonstrate our recent works in this area, for example, AudioLDM, AudioLDM2 and WavJourney for audio generation and storytelling, AudioSep for audio source separation, ACTUAL for audio captioning, SemantiCodec for audio coding, WavCraft for content creation and editing, and APT-LLMs for audio reasoning, and the datasets WavCaps, Sound-VECaps, and AudioSetCaps for training and evaluating large language-audio models.

Biography:

Wenwu Wang is a Professor in Signal Processing and Machine Learning and an Associate Head in External Engagement, School of Computer Science and Electronic Engineering, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 300 papers in these areas. He has been recognized as a (co-)author or (co)-recipient of more than 15 accolades, including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2020 and 2023 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, and LVA/ICA 2018 Best Student Paper Award. He is an Associate Editor (2020-2025) for IEEE/ACM Transactions on Audio Speech and Language Processing, and an Associate Editor (2024-2026) for IEEE Transactions on Multimedia. He was a Senior Area Editor (2019-2023) and Associate Editor (2014-2018) for IEEE Transactions on Signal Processing. He is the elected Chair (2023-2024) of IEEE Signal Processing Society (SPS) Machine Learning for Signal Processing Technical Committee, a Board Member (2023-2024) of IEEE SPS Technical Directions Board, the elected Chair (2025-2027) and Vice Chair (2022-2024) of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, an elected Member (2021-2026) of the IEEE SPS Signal Processing Theory and Methods Technical Committee. He has been on the organising committee of INTERSPEECH 2022, IEEE ICASSP 2019 & 2024, IEEE MLSP 2013 & 2024, and SSP 2009. He is Technical Program Co-Chair of IEEE MLSP 2025. He has been an invited Keynote or Plenary Speaker on more than 20 international conferences and workshops.

Email:

Address:University of Surrey, Stag Hill Campus,, , Guildford, Surrey, England, United Kingdom, GU2 7XH

James Hopgood

Topic:

Updates on future IEEE UK&I SPS Chapter events

The seminar will also give brief updates on future IEEE UK&I SPS Chapter events. We hope these events will contribute to refreshing and building a UK signal processing research map.

Biography:

James R. Hopgood is a Professor of Statistical Signal Processing at the University of Edinburgh (UoE, Scotland) and Director/PI of the EPSRC and MoD Centre for Doctoral Training in Sensing, Processing, and AI for Defence and Security (SPADS). James is also Dean of Quality and Enhancement in the College of Science and Engineering in UoE. Between 2019 and 2024, James was Director of Electronic and Electrical Engineering in the School of Engineering at the UoE, and also a member of the Institute for Imaging, Data, and Communications (IDCOM).

James’s research spans applications of statistical signal processing and machine learning in a diverse range of applications including multi-modal multi-sensor-fusion and multi-target tracking. James has worked with multiple companies translating underpinning statistical signal processing methods to applications such as improved electrophoresis analysis, data driven tracking solutions for magnetometer sensing technology, model-based super resolution imaging techniques for multibeam sonar systems, and developing alternative future concepts of Early Warning systems through the “DASA: Look Out! Maritime Early Warning Innovations”.

Email:

Address: Institute for Imaging, Data and Communications, Alexander Graham Bell Building, King's Buildings, Thomas Bayes Road, Edinburgh, United Kingdom, EH9 3FG

Media

Introduction slides from chapter Chair, James R. Hopgood

504.30 KiB