Where LLM Safety Breaks: Architecture, Interaction, and Agent Skills

#AI #safety #Security
Share

Large language models (LLMs) are increasingly deployed as conversational assistants and autonomous agents, but their safety behavior can change across multiple aspects. In this talk, we examine where LLM safety breaks across these three surfaces: the model architecture, the structure of user interaction, and the external skills available at deployment. We first present how sparse Mixture-of-Experts (MoE) architectures can expose unsafe routes, where altered routing decisions can turn otherwise safe responses into harmful ones. We then discuss task concurrency, a jailbreak setting in which adjacent words encode divergent intents and harmful requests are interleaved with benign ones, making guardrails less reliable. Finally, we study harmful agent skills in open skill ecosystems and show how pre-installed harmful skills can lower refusal rates in realistic agent contexts. Together, these studies suggest that LLM safety should be evaluated as a system-level property across architecture, interaction, and external knowledge.



  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar
  • No.28, West Xianning Road
  • Xi'an, Shaanxi
  • China 710049
  • Building: Hongli Building
  • Room Number: 4-7151

  • Contact Event Host
  • Starts 19 May 2026 04:00 AM UTC
  • Ends 19 May 2026 08:00 PM UTC
  • No Admission Charge