Where LLM Safety Breaks:Architecture, Interaction, and Agent Skills : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Where LLM Safety Breaks:Architecture, Interaction, and Agent Skills

#AI #safety #Security

Large language models (LLMs) are increasingly deployed as conversational assistants and autonomous agents, but their safety behavior can change across multiple aspects. In this talk, we examine where LLM safety breaks across these three surfaces: the model architecture, the structure of user interaction, and the external skills available at deployment. We first present how sparse Mixture-of-Experts (MoE) architectures can expose unsafe routes, where altered routing decisions can turn otherwise safe responses into harmful ones. We then discuss task concurrency, a jailbreak setting in which adjacent words encode divergent intents and harmful requests are interleaved with benign ones, making guardrails less reliable. Finally, we study harmful agent skills in open skill ecosystems and show how pre-installed harmful skills can lower refusal rates in realistic agent contexts. Together, these studies suggest that LLM safety should be evaluated as a system-level property across architecture, interaction, and external knowledge.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

No.28, West Xianning Road
Xi'an, Shaanxi
China 710049
Building: Hongli Building
Room Number: 4-7151

Contact Event Host

Starts 19 May 2026 04:00 AM UTC
Ends 19 May 2026 08:00 PM UTC
No Admission Charge