Invited online Talk: Machine Translation for Extremely Low Resource Languages

#iitg #lowresource #machinetranslation #NLP #guwahatisubsection #language #multilingual
Share

In the rapidly advancing landscape of large language models (LLMs), LLM-powered models demonstrate remarkable proficiency across a wide range of natural language processing (NLP) tasks and languages. However, their applicability to extremely low-resource languages (ELRLs) falls short of expectations. Considering this, we took a step back from LLM-heavy research and explored traditional fine-tuning methods with a lightweight language model like T5 to enable zero-shot machine translation for ELRLs. In this talk, I will present two noise augmentation-based approaches to facilitate machine translation from ELRLs to English. We leverage the lexical similarity between high-resource languages and closely related ELRLs and propose both random and linguistically inspired noise augmentation techniques. Experiments conducted across three diverse language groups indicate significant performance improvements. We hope these advancements will help bridge communication gaps in underserved communities and foster greater linguistic inclusivity.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 08 Apr 2025
  • Time: 11:30 AM UTC to 12:30 PM UTC
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • CSE Seminar Hall
  • IIT Guwahati
  • Guwahati, Assam
  • India
  • Building: Ground Floor

  • Contact Event Host


  Speakers

Topic:

Machine Translation for Extremely Low Resource Languages

In the rapidly advancing landscape of large language models (LLMs), LLM-powered models demonstrate remarkable proficiency across a wide range of natural language processing (NLP) tasks and languages. However, their applicability to extremely low-resource languages (ELRLs) falls short of expectations. Considering this, we took a step back from LLM-heavy research and explored traditional fine-tuning methods with a lightweight language model like T5 to enable zero-shot machine translation for ELRLs. In this talk, I will present two noise augmentation-based approaches to facilitate machine translation from ELRLs to English. We leverage the lexical similarity between high-resource languages and closely related ELRLs and propose both random and linguistically inspired noise augmentation techniques. Experiments conducted across three diverse language groups indicate significant performance improvements. We hope these advancements will help bridge communication gaps in underserved communities and foster greater linguistic inclusivity.

Biography:

Kaushal Kumar Maurya is a Post-Doctoral Research Associate in Artificial Intelligence at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE, specializing in Educational NLP. He earned his PhD from the Indian Institute of Technology (IIT) Hyderabad, where his research focused on low-resource NLP. He has collaborated with Microsoft's Translate and Auto-Suggest teams and has several publications in top *CL conferences. He is also a recipient of the prestigious Suzuki Foundation Fellowship, the Microsoft Academic Grant, and the Google Academic Grant.