Distilling LLMs: Training an In-House Model with a Mature One for Retrieval-Augmented Generation

#hamilton #technical #talk #power #STEM #wie #in-person #DEI-Events #transformers #dissolved-gas-analysis #testing
Share

This presentation outlines a practical pipeline for distilling large language models into compact, in-house alternatives using mature teacher models. We focus on retrieval-augmented generation workflows, where distilled students must learn context grounding, citation discipline, and latency-efficient behavior. The methodology covers data curation from teacher-generated RAG trajectories, constrained supervised fine-tuning with context masking, and evaluation using faithfulness, robustness, and efficiency metrics.

We demonstrate end-to-end integration with the Hugging Face platform, leveraging Model Hub for teacher selection, Datasets for curated training data, trl and peft for parameter-efficient training, Spaces for interactive validation, and Leaderboards for community benchmarking. Comparative analysis shows distillation achieves 85–95% of teacher performance at 20% inference cost, enabling deployable, privacy-compliant RAG systems.
The talk concludes with mitigation strategies for common pitfalls and future directions in agentic and multi-teacher distillation.





  Date and Time

  Location

  Hosts

  Registration



  • Add_To_Calendar_icon Add Event to Calendar
  • 133 Rebecca St,
  • Oakville,, Ontario
  • Canada L6K 1J5
  • Building: Trafalgar Park Community Centre
  • Room Number: Multipurpose Room 3
  • Click here for Map

  • Contact Event Host
  • eduardo.gomez.hennig@ieee.org
    sneh@rchilli.com

  • Starts 28 April 2026 04:00 AM UTC
  • Ends 25 June 2026 04:00 AM UTC
  • No Admission Charge


  Speakers

Zichao Li

Topic:

Distilling LLMs: Training an In-House Model with a Mature One for Retrieval-Augmented Generation

This presentation outlines a practical pipeline for distilling large language models into compact, in-house alternatives using mature teacher models. We focus on retrieval-augmented generation workflows, where distilled students must learn context grounding, citation discipline, and latency-efficient behavior. The methodology covers data curation from teacher-generated RAG trajectories, constrained supervised fine-tuning with context masking, and evaluation using faithfulness, robustness, and efficiency metrics.

We demonstrate end-to-end integration with the Hugging Face platform, leveraging Model Hub for teacher selection, Datasets for curated training data, trl and peft for parameter-efficient training, Spaces for interactive validation, and Leaderboards for community benchmarking. Comparative analysis shows distillation achieves 85–95% of teacher performance at 20% inference cost, enabling deployable, privacy-compliant RAG systems.
The talk concludes with mitigation strategies for common pitfalls and future directions in agentic and multi-teacher distillation.

 

Biography:

Zichao Li is an experienced solutioner in data science and knowledge discovery. He has been a solution provider to various financial institutions like BMO, Scotiabank, Mitsubishi, Citi and RBC for over 15 years. He got his PhD from University of Waterloo. In his spare time, he has also participated in data labeling and quality assurance activities for the development of various large language models. He has more than 40 publications mostly in optimization methodology and probabilistic models.

 

 

Email:





Agenda

7:00PM - Introduction of IEEE Hamilton Section

7:15PM - Presentation

8:00PM - Q&A

8:15PM - Refreshments