Thermal Challenges and Opportunities for AI/ML Hardware: From Chip to Facility
-- roadmaps, chiplets, heterogeneous architecture, 2.5D, 3D, characterization, liquid cooling ...
Thermal management is becoming an ever more critical challenge for AI chips as the power density increases. Both chip-level and facility-level cooling solutions need to be developed and optimized in order to support the demand and needs. At the chip level, advanced packaging technologies -- such as chiplet architectures and heterogeneous architectures like 2.5D, 3D, and 3.5D hybrid bonded technologies -- are becoming increasingly popular for driving performance and cost improvements in AI/ML hardware. However, these solutions also introduce additional complexity and thermal challenges. To address these challenges, ASIC cooling technology development is a key strategic enabler to ensure the competitiveness and scalability of AI/ML hardware roadmaps. These technologies aim to solve the high total power and increased power density challenges faced by AI/ML systems. On the other hand, at the facility level, various cold plate design and liquid cooling solutions are developing and need to become more mature to be deployed in large scale.
This presentation identifies areas for future thermal technology exploration at both ASIC and facility level that require investment to extend the cooling capabilities of future AI/ML roadmaps. These areas include:
• Thermal characterization of on-die thermal models
• Exploration of thermal interface materials
• Optimization of cold plate performance
• Evaluation of future embedded cooling solutions
• AALC and liquid cooling solutions at the rack level
Investing in these areas will help ensure the continued development of high-performance and scalable AI/ML hardware.
Date and Time
Location
Hosts
Registration
- Date: 12 Sep 2024
- Time: 12:00 PM to 01:00 PM
- All times are (UTC-07:00) Pacific Time (US & Canada)
- Add Event to Calendar
- Starts 12 August 2024 12:00 AM
- Ends 12 September 2024 01:00 PM
- All times are (UTC-07:00) Pacific Time (US & Canada)
- No Admission Charge
Speakers
Yin Hang of Infra HW Team, Meta
Biography:
Yin Hang is a Technical Lead Manager for thermal engineering at Meta Infra HW team. She leads the development of thermal designs and technologies for multiple platforms, including AI/ML, General Compute and Storage, and liquid cooling technology such as AALC (air assisted liquid cooling). Currently, she is focused on planning and pathfinding ASIC level cooling solutions, including package level, on die level, and new thermal interface materials and associated cooling solutions such as cold plate and embedded silicon microchannel technologies. She holds a Ph.D. degree from Purdue University with a focus on heat transfer. She has contributed to technical papers and panel discussions and talks in conferences such as ITherm, Open Compute Project (OCP), and serves as the thermal lead on the NIC OCP group and co-lead on the Heterogeneous Integration Roadmap (HIR) thermal chapter.
Agenda
see ‘Location’ for webex coordinates