Machine Learning Hardware Design for Efficiency, Flexibility and Scalability

#ML #hardware #design
Share

Machine learning (ML) is the driving application of the next-generation computational hardware. How to design ML hardware to achieve a high performance, efficiency, and flexibility to support fast growing ML workloads is a key challenge. Besides dataflow-optimized systolic arrays and single instruction, multiple data (SIMD) engines, efficient ML accelerators have been designed to take advantage of static and dynamic data sparsity. To accommodate the fast-evolving ML workloads, matrix engines can be integrated with an FPGA to provide the efficiency of kernel computation and the flexibility of control. To support the increasing ML model complexity, modular chiplets can be tiled on a 2.5D interposer and stacked in a 3D package. We envision that a combination of these techniques will be required to address the needs of future ML applications.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 30 Apr 2024
  • Time: 06:00 PM to 07:02 PM
  • All times are (UTC-04:00) Eastern Time (US & Canada)
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Hosts
  • Starts 20 February 2024 02:02 PM
  • Ends 28 March 2024 06:00 PM
  • All times are (UTC-04:00) Eastern Time (US & Canada)
  • No Admission Charge


  Speakers

Dr Zhengya Zhang Dr Zhengya Zhang

Topic:

Machine Learning Hardware Design for Efficiency, Flexibility and Scalability

Machine learning (ML) is the driving application of the next-generation computational hardware. How to design ML hardware to achieve a high performance, efficiency, and flexibility to support fast growing ML workloads is a key challenge. Besides dataflow-optimized systolic arrays and single instruction, multiple data (SIMD) engines, efficient ML accelerators have been designed to take advantage of static and dynamic data sparsity. To accommodate the fast-evolving ML workloads, matrix engines can be integrated with an FPGA to provide the efficiency of kernel computation and the flexibility of control. To support the increasing ML model complexity, modular chiplets can be tiled on a 2.5D interposer and stacked in a 3D package. We envision that a combination of these techniques will be required to address the needs of future ML applications.

Biography:

Zhengya Zhang received the B.A.Sc. degree from the University of Waterloo in Canada in 2003, and the M.S. and Ph.D. degrees from UC Berkeley in 2005 and 2009. He has been a faculty member at the University of Michigan, Ann Arbor since 2009. His research is in low-power and high-performance integrated circuits and systems for computing, communications, and signal processing. Dr. Zhang was a recipient of the NSF CAREER Award, the Intel Early Career Faculty Award, the Neil Van Eenam Memorial Award from the University of Michigan, and the David J. Sakrison Memorial Prize from UC Berkeley. He served on the program committees of the Symposia on VLSI Technology and Circuits and CICC, and the editorial board of the IEEE Transactions on VLSI Systems.

Email:

Address:University of Michigan, , Ann Arbor, United States





Agenda

6:00 PM - 7:00 PM EST : Talk

7:00 PM - 7:30 PM EST : Q/A