Training Large-scale Foundation Models on Emerging AI Accelerators

#STEM #GPT #Lehigh #CS
Share

Foundation models such as GPT-4 have garnered significant interest from both academia and industry. An outstanding feature of such models is so-called emergent capabilities, including multi-step reasoning, instruction following, and model calibration, in a wide range of application domains. Such capabilities were previously only attainable with specially designed ML models, such as those using carefully constructed knowledge graphs, in specific domains. As the capabilities of foundation models have increased, so too have their sizes at a rate much faster than Moore's law. The training of foundation models requires massive computing power. For instance, training a BERT model on a single state-of-the-art GPU machine with multi-A100 chips can take several days, while training GPT-3 models on a large multi-instance GPU cluster can take several months to complete the estimated 3*10^23 flops.

This talk provides an overview of the latest progress in supporting foundation model training and inference with new AI accelerators. It reviews progress on the modeling side, with an emphasis on the transformer architecture, and presents the system architecture supporting training and serving foundation models.

Explore the frontier of AI with us as we delve into the power and potential of foundation models like GPT-4. Discover how emergent capabilities are pushing the boundaries of what's possible and the groundbreaking AI accelerators making it all happen. Join our talk to uncover the future of AI training and application!



  Date and Time

  Location

  Hosts

  Registration



  • Date: 30 Nov 2023
  • Time: 07:00 PM to 08:00 PM
  • All times are (UTC-05:00) Eastern Time (US & Canada)
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Host
  • Starts 08 November 2023 07:30 AM
  • Ends 30 November 2023 07:00 PM
  • All times are (UTC-05:00) Eastern Time (US & Canada)
  • No Admission Charge


  Speakers

Jun (Luke) Huan Jun (Luke) Huan of Amazon AWS AI Labs

Topic:

Training Large-scale Foundation Models on Emerging AI Accelerators

Foundation models such as GPT-4 have garnered significant interest from both academia and industry. An outstanding feature of such models is so-called emergent capabilities, including multi-step reasoning, instruction following, and model calibration, in a wide range of application domains. Such capabilities were previously only attainable with specially designed ML models, such as those using carefully constructed knowledge graphs, in specific domains. As the capabilities of foundation models have increased, so too have their sizes at a rate much faster than Moore's law. The training of foundation models requires massive computing power. For instance, training a BERT model on a single state-of-the-art GPU machine with multi-A100 chips can take several days, while training GPT-3 models on a large multi-instance GPU cluster can take several months to complete the estimated 3*10^23 flops. Luke's talk provides an overview of the latest progress in supporting foundation model training and inference with new AI accelerators. It reviews progress on the modeling side, with an emphasis on the transformer architecture, and presents the system architecture supporting training and serving foundation models.

Biography:

Dr. Jun (Luke) Huan is a Principal Scientist at AWS AI Labs. Dr. Huan works on AI and Data Science. He has published more than 160 peer-reviewed papers in leading conferences and journals and has graduated eleven Ph.D. students. He was a recipient of the NSF Faculty Early Career Development Award in 2009. His group won several best paper awards from leading international conferences. Before joining AWS, he worked at Baidu Research as a distinguished scientist and the head of Baidu Big Data Laboratory. He founded StylingAI Inc., an AI start-up, and worked as the CEO and Chief Scientist in 2019-2021. Before joining the industry, he was the Charles E. and Mary Jane Spahr Professor in the EECS Department at the University of Kansas. From 2015-2018, Dr. Huan worked as a program director at the US NSF in charge of its big data program. 

Address:United States