Approaching System Reliability in the AI Era

#liquid-cooling #data-centers
Share

Meeting Date: Thursday, June 26, 2025
Time: Checkin via WebEx at 11:50 AM; Presentation at 12:00 noon (PST)
Cost: none
Reservations: events.vtools.ieee.org/m/485845

Summary: Ensuring hardware system reliability is increasingly critical in the evolving AI landscape, particularly within data centers. Drawing upon extensive experience leading reliability initiatives for cutting-edge hardware, this presentation will outline a general methodology for designing reliable complex AI systems. It will emphasize the necessity of a multidisciplinary approach, integrating model-based system engineering, rigorous reliability testing, and continuous system improvements, as exemplified by advancements in liquid cooling and power delivery technologies for high-performance AI processors. The talk will focus on the reliability approach needed for resilience in complex, AI-driven environments.


Bio: Venkata Chivukula is a Senior System Technology Engineer at Microsoft, specializing in liquid cooling and data center power and cooling system innovation within the Cloud and AI organization. Prior to Microsoft, he was a Senior System Reliability Engineer at Google, leading the development of liquid-cooled TPUs (including v5e and Trillium), GPU systems (A100, H100, GB200), and advanced power delivery technologies over five years. His earlier experience includes technical roles at Qualcomm, Bosch, GlobalFoundries, and Intel, focusing on fingerprint sensors, MEMS microphones, RF modules, and CMOS process technology. He is an IEEE Senior Member with a PhD in Electrical Engineering from Rensselaer Polytechnic Institute and has authored over 30 journal papers and 10 conference publications on MEMS, acoustic sensors, and vertical power technology, earning multiple best paper awards. His awards include the Google Tech Impact Award, Feats of Engineering, Cloud Impact Awards, Qualcomm’s Qual Star Award, Bosch Quality Prize, and the Qorvo Innovation Award.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 26 Jun 2025
  • Time: 07:00 PM UTC to 08:00 PM UTC
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Host
  • Co-sponsored by CS, CIS and Rel Chapters
  • Starts 03 June 2025 07:00 AM UTC
  • Ends 26 June 2025 07:00 AM UTC
  • No Admission Charge