IEEE SPS SBC Webinar: Detection and Localization of Sound Events (By Dr. Tuomas Virtanen)
With the emergence of advanced machine learning techniques and large-scale datasets, holistic analysis of realistic soundscapes becomes more and more appealing. In the case of everyday soundscapes this can mean not only recognizing what are the sounds present in an acoustic scene, but also where they are located and when they occur. This talk will discuss the task of joint detection and localization of sound events addressing the above problem. The state of the art methods typically use spectral representations and deep neural networks based on convolutional, recurrent, and attention layers that share many similarities to neighboring fields. However, the task also has several unique challenges, which will require specific solutions. We will give an overview of the task setup for training machine learning models, acoustic features for representing multichannel signals, topologies of deep neural networks, and loss functions for training systems. Since the performance of the methods is heavily based on the training data used, we will also discuss datasets that can be used for the development of methods and their preparation. We will discuss the recent DCASE evaluation campaign tasks that addressed the problem of joint detection and localization of sound events.
Date and Time
Location
Hosts
Registration
- Date: 18 Jun 2024
- Time: 12:30 PM to 01:30 PM
- All times are (UTC+05:30) Chennai
- Add Event to Calendar
Speakers
Dr. Tuomas Virtanen
Explainable AI in Health care
With the emergence of advanced machine learning techniques and large-scale datasets, holistic analysis of realistic soundscapes becomes more and more appealing. In the case of everyday soundscapes this can mean not only recognizing what are the sounds present in an acoustic scene, but also where they are located and when they occur. This talk will discuss the task of joint detection and localization of sound events addressing the above problem. The state of the art methods typically use spectral representations and deep neural networks based on convolutional, recurrent, and attention layers that share many similarities to neighboring fields. However, the task also has several unique challenges, which will require specific solutions. We will give an overview of the task setup for training machine learning models, acoustic features for representing multichannel signals, topologies of deep neural networks, and loss functions for training systems. Since the performance of the methods is heavily based on the training data used, we will also discuss datasets that can be used for the development of methods and their preparation. We will discuss the recent DCASE evaluation campaign tasks that addressed the problem of joint detection and localization of sound events.
Biography:
Tuomas Virtanen is Professor at Tampere University, Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from Tampere University of Technology in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on computational acoustic scene analysis and sound source separation. His research interests include machine listening, computational content analysis of audio, and machine learning for audio. He has authored more than 200 scientific publications on the above topics, which have been
cited more than 19000 times. He has received IEEE Signal Processing Society best paper awards multiple times, as well as many other best paper awards. He is an IEEE Fellow, IEEE Signal Processing Society Distinguished Lecturer 2024-2025, recipient of the ERC 2014 Starting Grant, and has been a member of the Audio and Acoustic Signal Processing
Technical Committee of IEEE Signal Processing Society.
Address:Tampere University, , Finland