BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
DTSTART:20260308T030000
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:PDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20261101T010000
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:PST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260618T062248Z
UID:08F29111-1750-419D-AD81-45B3957D09BD
DTSTART;TZID=America/Los_Angeles:20260626T120000
DTEND;TZID=America/Los_Angeles:20260626T130000
DESCRIPTION:Audio compression is often proposed to improve the efficiency o
 f multimodal large language models\, but its impact on downstream task per
 formance remains underexplored. This talk examines how semantic neural aud
 io codecs behave under token reduction constraints\, using cross-modal att
 ention as a signal to discard frames with low semantic content. On audio q
 uestion-answering benchmarks\, attention-guided frame selection removes 10
 –30% of frames while matching baseline accuracy and answer consistency\,
  and identifies a critical compression threshold (keep ratio ~0.7) below w
 hich performance degrades sharply. The talk also discusses an &quot;answer cons
 istency paradox&quot; where models remain highly self-consistent (&gt;98%) even as
  accuracy degrades and what this decoupling of consistency from correctnes
 s means for evaluating compressed multimodal systems in low-resource deplo
 yments.\n\nSpeaker(s): Prerana\n\nVirtual: https://events.vtools.ieee.org/
 m/563360
LOCATION:Virtual: https://events.vtools.ieee.org/m/563360
ORGANIZER:vicky.h.lu@gmail.com
SEQUENCE:26
SUMMARY:Attention-Guided Audio Compression for Multimodal LLMs 
URL;VALUE=URI:https://events.vtools.ieee.org/m/563360
X-ALT-DESC:Description: &lt;br /&gt;&lt;p class=&quot;MsoNormal&quot;&gt;Audio compression is oft
 en proposed to improve the efficiency of multimodal large language models\
 , but its impact on downstream task performance remains underexplored. Thi
 s talk examines how semantic neural audio codecs behave under token reduct
 ion constraints\, using cross-modal attention as a signal to discard frame
 s with low semantic content. On audio question-answering benchmarks\, atte
 ntion-guided frame selection removes 10&amp;ndash\;30% of frames while matchin
 g baseline accuracy and answer consistency\, and identifies a critical com
 pression threshold (keep ratio ~0.7) below which performance degrades shar
 ply. The talk also discusses an &quot;answer consistency paradox&quot; where models 
 remain highly self-consistent (&amp;gt\;98%) even as accuracy degrades and wha
 t this decoupling of consistency from correctness means for evaluating com
 pressed multimodal systems in low-resource deployments.&lt;/p&gt;
END:VEVENT
END:VCALENDAR

