Attention-Guided Audio Compression for Multimodal LLMs : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Attention-Guided Audio Compression for Multimodal LLMs

#audio-compression #technical

Audio compression is often proposed to improve the efficiency of multimodal large language models, but its impact on downstream task performance remains underexplored. This talk examines how semantic neural audio codecs behave under token reduction constraints, using cross-modal attention as a signal to discard frames with low semantic content. On audio question-answering benchmarks, attention-guided frame selection removes 10–30% of frames while matching baseline accuracy and answer consistency, and identifies a critical compression threshold (keep ratio ~0.7) below which performance degrades sharply. The talk also discusses an "answer consistency paradox" where models remain highly self-consistent (>98%) even as accuracy degrades and what this decoupling of consistency from correctness means for evaluating compressed multimodal systems in low-resource deployments.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

Loading virtual attendance info...

Contact Event Host

Starts 12 June 2026 07:00 AM UTC
Ends 26 June 2026 08:00 PM UTC
No Admission Charge

Speakers

Prerana

Topic:

Attention-Guided Audio Compression for Multimodal LLMs

Biography:

Prerana Rane is a researcher and engineer working at the intersection of speech and audio machine learning and multimodal AI. She holds an M.S. in Computer Engineering from Virginia Tech. She spent seven years at Intel's Next Generation and Standards group, developing PHY-layer systems and algorithms for 5G NR. She represented Intel as a 3GPP RAN1 delegate across Releases 16–18, with over 30 technical contributions, 17 patents and multiple proposals adopted into the 5G standards. She is an IEEE Senior Member and serves as Secretary of the IEEE Signal Processing Society Santa Clara Valley Chapter. Her broader research interests span signal processing, audio and speech machine learning, and efficient multimodal systems.

Email:

Address:California, United States