Why do small language models underperform?

#STEM, #systems, #technology #networking #society #communications #sensor #internet
Share

IEEE ComSoc Norther Virginia chapter and GMU Department of Computer Science invites you to attend the following Distinguished Lecture:

Title: Why do small language models underperform?

Speaker: Benoît Sagot, Director of Research at INRIA

Date: May 2, 2024

Time: 11:00am – 12:00pm

In person Location: GMU Fairfax campus, Nguyen Engineering Bldg., Conference Room 4201

Virtual: Microsoft Teams:  Join the meeting now Meeting ID: 292 789 339 112 Passcode: jM8w7c

Dial-in by phone
+1 571-397-2084,,218888141# United States, Arlington
Phone conference ID: 218 888 141#
For organizers: Meeting options | Reset dial-in PIN

Abstract:

Language models, and in particular generative and conversational language models, are at the heart of recent advances in natural language processing (NLP). Understanding how these models represent textual content and how they learn these representations still raises multiple research questions. In this talk, I will start from an observation that small models are less efficient than expected. I will show that language models relying on the Transformer architecture tend to produce vector representations that are not isotropically distributed in space. This anisotropy is linked to the way in which these models are learned, which leads to the frequency of the tokens taking a preponderant place in their representation. I will show that this effect has negative consequences on the ability of small models to train satisfactorily (“performance saturation”) but does not seem to affect larger models. I will then describe a new approach for training language models intended to avoid the undesirable effects of this prevalence of frequency information. The resulting “headless” models display a number of advantages over standard models, including on downstream performance.

Bio: 

Benoît Sagot is a computer scientist specialized in natural language processing (NLP). He is a Senior Researcher (Directeur de Recherches) at INRIA, where is heads the INRIA research project ALMAnaCH in Paris, France. He also holds a chair in the PRAIRIE institute dedicated to artificial intelligence, and currently holds the annual chair for computer science in the Collège de France. His research focuses on language modelling, machine translation, language resource development and computational linguistics, with a focus on French in all its form and on less-resourced languages.

 

________________________________________________________________________________


  Date and Time

  Location

  Hosts

  Registration



  • Date: 02 May 2024
  • Time: 03:00 PM UTC to 04:00 PM UTC
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • 4400 University Drive
  • Fairfax, Virginia
  • United States 22030
  • Building: Nguyen Engineering Bldg., Conference Room 4201

  • Contact Event Host
  • Co-sponsored by GMU Department of Computer Science