Scalable Deep Neural Network Hardware with Multi-Chip Modules

#pcjs #STEM #engineering #DNN
Share

Deep Neural Network (DNN) use cases range have diverse performance and power targets. These are highly compute-intensive and have growing demands as DNN models get larger and more complex. Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. While accelerators fabricated on a single monolithic chip are optimal for specific network sizes, MCM-based architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains. This talk explores the benefits of using MCMs with fine-grained chiplets for scaling DNN inference. It presents a 36-chiplet prototype MCM system for deep-learning. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. Communication energy is minimized with large on-chip distributed weight storage and a hierarchical network-on-chip and network-on-package, and inference energy is minimized through extensive data reuse.



  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar
  • Engineering Quadrangle
  • Olden Street
  • Princeton, New Jersey
  • United States 08544
  • Building: Department of Electrical & Computer Engineering
  • Room Number: B205

  • Contact Event Hosts


  Speakers

Dr Rangharajan Venkatesan of Nvidia

Topic:

Scalable Deep Neural Network Hardware with Multi-Chip Modules

Deep Neural Network (DNN) use cases range have diverse performance and power targets. These are highly compute-intensive and have growing demands as DNN models get larger and more complex. Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. While accelerators fabricated on a single monolithic chip are optimal for specific network sizes, MCM-based architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains. This talk explores the benefits of using MCMs with fine-grained chiplets for scaling DNN inference. It presents a 36-chiplet prototype MCM system for deep-learning. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. Communication energy is minimized with large on-chip distributed weight storage and a hierarchical network-on-chip and network-on-package, and inference energy is minimized through extensive data reuse.

Biography:

 

Bio: Rangharajan Venkatesan is a Senior Research Scientist in the ASIC & VLSI Research
group in NVIDIA. He received the B.Tech. degree in Electronics and Communication
Engineering from the Indian Institute of Technology, Roorkee in 2009 and the Ph.D. degree in
Electrical and Computer Engineering from Purdue University in August 2014. His research
interests are in the areas of low-power VLSI design and computer architecture with particular
focus in deep learning accelerators, high-level synthesis, and spintronic memories. He has
received Best Paper Awards for his work on deep learning accelerators from IEEE/ACM
Symposium on Microarchitecture (MICRO) and Journal of Solid-State Circuits (JSSC). His work
on spintronic memory design was recognized with the Best Paper Award at the International
Symposium on Low Power Electronics and Design (ISLPED), and Best paper nomination at the
Design, Automation and Test in Europe (DATE). His paper titled, “MACACO: Modeling and
Analysis of Circuits for Approximate Computing”, received the IEEE/ACM International
Conference on Computer-Aided Design (ICCAD) Ten Year Retrospective Most Influential Paper
Award in 2021. He is a member of the technical program committees of several leading
IEEE/ACM conferences including ISSCC, DAC, MICRO, and ISLPED.

Email:

Address:United States