BEGIN:VCALENDAR
VERSION:2.0
PRODID:IEEE vTools.Events//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:America/New_York
BEGIN:DAYLIGHT
DTSTART:20240310T030000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20241103T010000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20241003T144111Z
UID:93598DDF-4AF2-4822-872C-DDC4EF94A85A
DTSTART;TZID=America/New_York:20240502T110000
DTEND;TZID=America/New_York:20240502T120000
DESCRIPTION:IEEE ComSoc Norther Virginia chapter and GMU Department of Comp
 uter Science invites you to attend the following Distinguished Lecture:\n\
 nTitle: Why do small language models underperform?\n\nSpeaker: Benoît Sag
 ot\, Director of Research at INRIA\n\nDate: May 2\, 2024\n\nTime: 11:00am 
 – 12:00pm\n\nIn person Location: GMU Fairfax campus\, Nguyen Engineering
  Bldg.\, Conference Room 4201\nVirtual: Microsoft Teams: [Join the meeting
  now](https://nam11.safelinks.protection.outlook.com/ap/t-59584e83/?url=ht
 tps%3A%2F%2Fteams.microsoft.com%2Fl%2Fmeetup-join%2F19%253ameeting_OGUwZGI
 0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi%2540thread.v2%2F0%3Fcontext%3D%2
 57b%2522Tid%2522%253a%25229e857255-df57-4c47-a0c0-0546460380cb%2522%252c%2
 522Oid%2522%253a%2522f9586db0-74ee-4635-a01b-2383b74f8a0c%2522%257d&amp;data=0
 5%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255d
 f574c47a0c00546460380cb%7C0%7C0%7C638501649141618849%7CUnknown%7CTWFpbGZsb
 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%
 7C%7C%7C&amp;sdata=MUjIg2xib2wJ22N21llS60nE%2FSTEPpqg%2FSVOdLXJFHA%3D&amp;reserved
 =0) Meeting ID: 292 789 339 112 Passcode: jM8w7c\n------------------------
 ---------------------------------------\n\nDial-in by phone\n[+1 571-397-2
 084\,\,218888141#](tel:+15713972084\,\,218888141#) United States\, Arlingt
 on\n[Find a local number](https://nam11.safelinks.protection.outlook.com/?
 url=https%3A%2F%2Fdialin.teams.microsoft.com%2F9424c9fe-3b57-41d6-9131-1d3
 b9b7cf4a9%3Fid%3D218888141&amp;data=05%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca08
 14691061508dc69ddedfd%7C9e857255df574c47a0c00546460380cb%7C0%7C0%7C6385016
 49141626316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLC
 JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&amp;sdata=r6YY83JURqPLJf9J2QcdQ5H6w3
 QBZTlOs4HAUUBvvSQ%3D&amp;reserved=0)\nPhone conference ID: 218 888 141#\nFor o
 rganizers: [Meeting options](https://nam11.safelinks.protection.outlook.co
 m/?url=https%3A%2F%2Fteams.microsoft.com%2FmeetingOptions%2F%3ForganizerId
 %3Df9586db0-74ee-4635-a01b-2383b74f8a0c%26tenantId%3D9e857255-df57-4c47-a0
 c0-0546460380cb%26threadId%3D19_meeting_OGUwZGI0OTktNTdhZS00NmNlLWEzZGEtZT
 JhNGI2Yjg5YmJi%40thread.v2%26messageId%3D0%26language%3Den-US&amp;data=05%7C02
 %7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c4
 7a0c00546460380cb%7C0%7C0%7C638501649141633701%7CUnknown%7CTWFpbGZsb3d8eyJ
 WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%
 7C&amp;sdata=KSJGsLYD3jKk7GBpn2XuSPetsCGuvQn9I6sJYXPZfs0%3D&amp;reserved=0) | [Res
 et dial-in PIN](https://nam11.safelinks.protection.outlook.com/?url=https%
 3A%2F%2Fdialin.teams.microsoft.com%2Fusp%2Fpstnconferencing&amp;data=05%7C02%7
 Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a
 0c00546460380cb%7C0%7C0%7C638501649141641487%7CUnknown%7CTWFpbGZsb3d8eyJWI
 joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C
 &amp;sdata=2ONDG7yjY3RckniAVPkRkliPHOBWdNqIhM2RgAA8O10%3D&amp;reserved=0)\n\nAbstr
 act:\n\nLanguage models\, and in particular generative and conversational 
 language models\, are at the heart of recent advances in natural language 
 processing (NLP). Understanding how these models represent textual content
  and how they learn these representations still raises multiple research q
 uestions. In this talk\, I will start from an observation that small model
 s are less efficient than expected. I will show that language models relyi
 ng on the Transformer architecture tend to produce vector representations 
 that are not isotropically distributed in space. This anisotropy is linked
  to the way in which these models are learned\, which leads to the frequen
 cy of the tokens taking a preponderant place in their representation. I wi
 ll show that this effect has negative consequences on the ability of small
  models to train satisfactorily (“performance saturation”) but does no
 t seem to affect larger models. I will then describe a new approach for tr
 aining language models intended to avoid the undesirable effects of this p
 revalence of frequency information. The resulting “headless” models di
 splay a number of advantages over standard models\, including on downstrea
 m performance.\n\nBio:\n\nBenoît Sagot is a computer scientist specialize
 d in natural language processing (NLP). He is a Senior Researcher (Directe
 ur de Recherches) at INRIA\, where is heads the INRIA research project ALM
 AnaCH in Paris\, France. He also holds a chair in the PRAIRIE institute de
 dicated to artificial intelligence\, and currently holds the annual chair 
 for computer science in the Collège de France. His research focuses on la
 nguage modelling\, machine translation\, language resource development and
  computational linguistics\, with a focus on French in all its form and on
  less-resourced languages.\n\n____________________________________________
 ____________________________________\n\nCo-sponsored by: GMU Department of
  Computer Science\n\nBldg: Nguyen Engineering Bldg.\, Conference Room 4201
 \, 4400 University Drive \, Fairfax\, Virginia\, United States\, 22030\, V
 irtual: https://events.vtools.ieee.org/m/419402
LOCATION:Bldg: Nguyen Engineering Bldg.\, Conference Room 4201\, 4400 Unive
 rsity Drive \, Fairfax\, Virginia\, United States\, 22030\, Virtual: https
 ://events.vtools.ieee.org/m/419402
ORGANIZER:
SEQUENCE:3
SUMMARY:Why do small language models underperform?
URL;VALUE=URI:https://events.vtools.ieee.org/m/419402
X-ALT-DESC:Description: &lt;br /&gt;&lt;div&gt;\n&lt;div class=&quot;x_WordSection1&quot;&gt;\n&lt;p class
 =&quot;x_MsoNormal&quot;&gt;IEEE ComSoc Norther Virginia chapter and GMU Department of 
 Computer Science invites you to attend the following Distinguished Lecture
 :&lt;/p&gt;\n&lt;p class=&quot;x_MsoNormal&quot;&gt;Title:&lt;strong&gt;&amp;nbsp\;&lt;/strong&gt;&lt;strong&gt;Why do
  small language models underperform?&lt;/strong&gt;&lt;/p&gt;\n&lt;p class=&quot;x_MsoNormal&quot;&gt;
 Speaker:&lt;strong&gt;&amp;nbsp\;Beno&amp;icirc\;t Sagot\, Director of Research at INRIA
 &lt;/strong&gt;&lt;/p&gt;\n&lt;p class=&quot;x_MsoNormal&quot; style=&quot;text-align: left\;&quot; align=&quot;ce
 nter&quot;&gt;Date: May 2\, 2024&lt;/p&gt;\n&lt;p class=&quot;x_MsoNormal&quot; style=&quot;text-align: le
 ft\;&quot; align=&quot;center&quot;&gt;Time: 11:00am &amp;ndash\; 12:00pm&lt;/p&gt;\n&lt;p class=&quot;x_MsoNo
 rmal&quot; style=&quot;text-align: left\;&quot; align=&quot;center&quot;&gt;In person Location: GMU Fa
 irfax campus\, Nguyen Engineering Bldg.\, Conference Room 4201&lt;/p&gt;\n&lt;div&gt;&lt;
 span class=&quot;x_me-email-text&quot;&gt;Virtual: Microsoft Teams: &amp;nbsp\;&lt;/span&gt;&lt;a id
 =&quot;x_meet_invite_block.action.join_link&quot; class=&quot;x_me-email-headline&quot; title=
 &quot;Original URL: https://teams.microsoft.com/l/meetup-join/19%3ameeting_OGUw
 ZGI0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi%40thread.v2/0?context=%7b%22T
 id%22%3a%229e857255-df57-4c47-a0c0-0546460380cb%22%2c%22Oid%22%3a%22f9586d
 b0-74ee-4635-a01b-2383b74f8a0c%22%7d. Click or tap if you trust this link.
 &quot; href=&quot;https://nam11.safelinks.protection.outlook.com/ap/t-59584e83/?url=
 https%3A%2F%2Fteams.microsoft.com%2Fl%2Fmeetup-join%2F19%253ameeting_OGUwZ
 GI0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi%2540thread.v2%2F0%3Fcontext%3D
 %257b%2522Tid%2522%253a%25229e857255-df57-4c47-a0c0-0546460380cb%2522%252c
 %2522Oid%2522%253a%2522f9586db0-74ee-4635-a01b-2383b74f8a0c%2522%257d&amp;amp\
 ;data=05%7C02%7Ckhassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e
 857255df574c47a0c00546460380cb%7C0%7C0%7C638501649141618849%7CUnknown%7CTW
 FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
 3D%7C0%7C%7C%7C&amp;amp\;sdata=MUjIg2xib2wJ22N21llS60nE%2FSTEPpqg%2FSVOdLXJFHA
 %3D&amp;amp\;reserved=0&quot; target=&quot;_blank&quot; rel=&quot;noreferrer noopener&quot; data-auth=&quot;
 Verified&quot; data-linkindex=&quot;1&quot;&gt;Join the meeting now&lt;/a&gt;&lt;span class=&quot;x_me-ema
 il-text-secondary&quot;&gt;&amp;nbsp\;Meeting ID: &lt;/span&gt;&lt;span class=&quot;x_me-email-text&quot;
 &gt;292 789 339 112&lt;/span&gt;&lt;span class=&quot;x_me-email-text-secondary&quot;&gt;&amp;nbsp\;Pass
 code: &lt;/span&gt;&lt;span class=&quot;x_me-email-text&quot;&gt;jM8w7c&lt;/span&gt;&lt;/div&gt;\n&lt;div&gt;&lt;hr&gt;&lt;
 /div&gt;\n&lt;div&gt;\n&lt;div&gt;&lt;span class=&quot;x_me-email-text&quot;&gt;Dial-in by phone&lt;/span&gt;&lt;/
 div&gt;\n&lt;div&gt;&lt;a id=&quot;x_meet_invite_block.action.join_phone_number1&quot; class=&quot;x_
 me-email-link&quot; href=&quot;tel:+15713972084\,\,218888141#&quot; target=&quot;_blank&quot; rel=&quot;
 noopener noreferrer&quot; data-auth=&quot;NotApplicable&quot; data-linkindex=&quot;2&quot;&gt;+1 571-3
 97-2084\,\,218888141#&lt;/a&gt;&amp;nbsp\;&lt;span class=&quot;x_me-email-text&quot;&gt;United State
 s\, Arlington&lt;/span&gt;&lt;/div&gt;\n&lt;div&gt;&lt;a id=&quot;x_meet_invite_block.action.join_ph
 one_find_local_number&quot; class=&quot;x_me-email-link&quot; title=&quot;Original URL: https:
 //dialin.teams.microsoft.com/9424c9fe-3b57-41d6-9131-1d3b9b7cf4a9?id=21888
 8141. Click or tap if you trust this link.&quot; href=&quot;https://nam11.safelinks.
 protection.outlook.com/?url=https%3A%2F%2Fdialin.teams.microsoft.com%2F942
 4c9fe-3b57-41d6-9131-1d3b9b7cf4a9%3Fid%3D218888141&amp;amp\;data=05%7C02%7Ckha
 ssan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0c00
 546460380cb%7C0%7C0%7C638501649141626316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
 C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&amp;amp
 \;sdata=r6YY83JURqPLJf9J2QcdQ5H6w3QBZTlOs4HAUUBvvSQ%3D&amp;amp\;reserved=0&quot; ta
 rget=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-auth=&quot;Verified&quot; data-linkinde
 x=&quot;3&quot;&gt;Find a local number&lt;/a&gt;&lt;/div&gt;\n&lt;/div&gt;\n&lt;div&gt;&lt;span class=&quot;x_me-email-
 text-secondary&quot;&gt;Phone conference ID:&amp;nbsp\;&lt;/span&gt;&lt;span class=&quot;x_me-email-
 text&quot;&gt;218 888 141#&lt;/span&gt;&lt;/div&gt;\n&lt;div&gt;&lt;span class=&quot;x_me-email-text-seconda
 ry&quot;&gt;For organizers:&amp;nbsp\;&lt;/span&gt;&lt;a id=&quot;x_meet_invite_block.action.organiz
 er_meet_options&quot; class=&quot;x_me-email-link&quot; title=&quot;Original URL: https://team
 s.microsoft.com/meetingOptions/?organizerId=f9586db0-74ee-4635-a01b-2383b7
 4f8a0c&amp;amp\;tenantId=9e857255-df57-4c47-a0c0-0546460380cb&amp;amp\;threadId=19
 _meeting_OGUwZGI0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi@thread.v2&amp;amp\;m
 essageId=0&amp;amp\;language=en-US. Click or tap if you trust this link.&quot; href
 =&quot;https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fteams.
 microsoft.com%2FmeetingOptions%2F%3ForganizerId%3Df9586db0-74ee-4635-a01b-
 2383b74f8a0c%26tenantId%3D9e857255-df57-4c47-a0c0-0546460380cb%26threadId%
 3D19_meeting_OGUwZGI0OTktNTdhZS00NmNlLWEzZGEtZTJhNGI2Yjg5YmJi%40thread.v2%
 26messageId%3D0%26language%3Den-US&amp;amp\;data=05%7C02%7Ckhassan1%40gmu.edu%
 7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0c00546460380cb%7C0%
 7C0%7C638501649141633701%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
 joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&amp;amp\;sdata=KSJGsLYD
 3jKk7GBpn2XuSPetsCGuvQn9I6sJYXPZfs0%3D&amp;amp\;reserved=0&quot; target=&quot;_blank&quot; re
 l=&quot;noreferrer noopener&quot; data-auth=&quot;Verified&quot; data-linkindex=&quot;4&quot;&gt;Meeting op
 tions&lt;/a&gt;&amp;nbsp\;|&amp;nbsp\;&lt;a id=&quot;x_meet_invite_block.action.organizer_reset_
 dialin_pin&quot; class=&quot;x_me-email-link&quot; title=&quot;Original URL: https://dialin.te
 ams.microsoft.com/usp/pstnconferencing. Click or tap if you trust this lin
 k.&quot; href=&quot;https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%
 2Fdialin.teams.microsoft.com%2Fusp%2Fpstnconferencing&amp;amp\;data=05%7C02%7C
 khassan1%40gmu.edu%7Cc082474ca0814691061508dc69ddedfd%7C9e857255df574c47a0
 c00546460380cb%7C0%7C0%7C638501649141641487%7CUnknown%7CTWFpbGZsb3d8eyJWIj
 oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&amp;
 amp\;sdata=2ONDG7yjY3RckniAVPkRkliPHOBWdNqIhM2RgAA8O10%3D&amp;amp\;reserved=0&quot;
  target=&quot;_blank&quot; rel=&quot;noreferrer noopener&quot; data-auth=&quot;Verified&quot; data-linki
 ndex=&quot;5&quot;&gt;Reset dial-in PIN&lt;/a&gt;&lt;/div&gt;\n&lt;p class=&quot;x_MsoNormal&quot;&gt;&lt;strong&gt;Abstr
 act:&lt;/strong&gt;&lt;/p&gt;\n&lt;p class=&quot;x_MsoNormal&quot;&gt;Language models\, and in particu
 lar generative and conversational language models\, are at the heart of re
 cent advances in natural language processing (NLP). Understanding how thes
 e models represent textual content and how they learn these representation
 s still raises multiple research questions. In this talk\, I will start fr
 om an observation that small models are less efficient than expected. I wi
 ll show that language models relying on the Transformer architecture tend 
 to produce vector representations that are not isotropically distributed i
 n space. This anisotropy is linked to the way in which these models are le
 arned\, which leads to the frequency of the tokens taking a preponderant p
 lace in their representation. I will show that this effect has negative co
 nsequences on the ability of small models to train satisfactorily (&amp;ldquo\
 ;performance saturation&amp;rdquo\;) but does not seem to affect larger models
 . I will then describe a new approach for training language models intende
 d to avoid the undesirable effects of this prevalence of frequency informa
 tion. The resulting &amp;ldquo\;headless&amp;rdquo\; models display a number of ad
 vantages over standard models\, including on downstream performance.&lt;/p&gt;\n
 &lt;p class=&quot;x_MsoNormal&quot;&gt;&lt;strong&gt;Bio:&lt;/strong&gt;&amp;nbsp\;&lt;/p&gt;\n&lt;p class=&quot;x_MsoNo
 rmal&quot;&gt;Beno&amp;icirc\;t Sagot is a computer scientist specialized in natural l
 anguage processing (NLP). He is a Senior Researcher (Directeur de Recherch
 es) at INRIA\, where is heads the INRIA research project ALMAnaCH in Paris
 \, France. He also holds a chair in the PRAIRIE institute dedicated to art
 ificial intelligence\, and currently holds the annual chair for computer s
 cience in the Coll&amp;egrave\;ge de France. His research focuses on language 
 modelling\, machine translation\, language resource development and comput
 ational linguistics\, with a focus on French in all its form and on less-r
 esourced languages.&lt;/p&gt;\n&lt;/div&gt;\n&lt;/div&gt;\n&lt;p&gt;&amp;nbsp\;&lt;/p&gt;\n&lt;div class=&quot;x_me-
 email-text&quot;&gt;\n&lt;div&gt;_______________________________________________________
 _________________________&lt;/div&gt;\n&lt;/div&gt;
END:VEVENT
END:VCALENDAR

