Why do small language models underperform? : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Why do small language models underperform?

#STEM, #systems, #technology #networking #society #communications #sensor #internet

IEEE ComSoc Norther Virginia chapter and GMU Department of Computer Science invites you to attend the following Distinguished Lecture:

Title: Why do small language models underperform?

Speaker: Benoît Sagot, Director of Research at INRIA

Date: May 2, 2024

Time: 11:00am – 12:00pm

In person Location: GMU Fairfax campus, Nguyen Engineering Bldg., Conference Room 4201

Virtual: Microsoft Teams: Join the meeting now Meeting ID: 292 789 339 112 Passcode: jM8w7c

Dial-in by phone

+1 571-397-2084,,218888141# United States, Arlington

Find a local number

Phone conference ID: 218 888 141#

For organizers: Meeting options | Reset dial-in PIN

Abstract:

Language models, and in particular generative and conversational language models, are at the heart of recent advances in natural language processing (NLP). Understanding how these models represent textual content and how they learn these representations still raises multiple research questions. In this talk, I will start from an observation that small models are less efficient than expected. I will show that language models relying on the Transformer architecture tend to produce vector representations that are not isotropically distributed in space. This anisotropy is linked to the way in which these models are learned, which leads to the frequency of the tokens taking a preponderant place in their representation. I will show that this effect has negative consequences on the ability of small models to train satisfactorily (“performance saturation”) but does not seem to affect larger models. I will then describe a new approach for training language models intended to avoid the undesirable effects of this prevalence of frequency information. The resulting “headless” models display a number of advantages over standard models, including on downstream performance.

Bio:

Benoît Sagot is a computer scientist specialized in natural language processing (NLP). He is a Senior Researcher (Directeur de Recherches) at INRIA, where is heads the INRIA research project ALMAnaCH in Paris, France. He also holds a chair in the PRAIRIE institute dedicated to artificial intelligence, and currently holds the annual chair for computer science in the Collège de France. His research focuses on language modelling, machine translation, language resource development and computational linguistics, with a focus on French in all its form and on less-resourced languages.

________________________________________________________________________________

Date and Time

Location

Hosts

Registration

Date: 02 May 2024
Time: 03:00 PM UTC to 04:00 PM UTC
Add Event to Calendar
iCal
Google Calendar

If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.

4400 University Drive
Fairfax, Virginia
United States 22030
Building: Nguyen Engineering Bldg., Conference Room 4201

Contact Event Host
Co-sponsored by GMU Department of Computer Science