Invited online Talk: Machine Translation for Extremely Low Resource Languages
In the rapidly advancing landscape of large language models (LLMs), LLM-powered models demonstrate remarkable proficiency across a wide range of natural language processing (NLP) tasks and languages. However, their applicability to extremely low-resource languages (ELRLs) falls short of expectations. Considering this, we took a step back from LLM-heavy research and explored traditional fine-tuning methods with a lightweight language model like T5 to enable zero-shot machine translation for ELRLs. In this talk, I will present two noise augmentation-based approaches to facilitate machine translation from ELRLs to English. We leverage the lexical similarity between high-resource languages and closely related ELRLs and propose both random and linguistically inspired noise augmentation techniques. Experiments conducted across three diverse language groups indicate significant performance improvements. We hope these advancements will help bridge communication gaps in underserved communities and foster greater linguistic inclusivity.
Date and Time
Location
Hosts
Registration
- Date: 08 Apr 2025
- Time: 11:30 AM UTC to 12:30 PM UTC
-
Add Event to Calendar
- CSE Seminar Hall
- IIT Guwahati
- Guwahati, Assam
- India
- Building: Ground Floor
Speakers
Machine Translation for Extremely Low Resource Languages
In the rapidly advancing landscape of large language models (LLMs), LLM-powered models demonstrate remarkable proficiency across a wide range of natural language processing (NLP) tasks and languages. However, their applicability to extremely low-resource languages (ELRLs) falls short of expectations. Considering this, we took a step back from LLM-heavy research and explored traditional fine-tuning methods with a lightweight language model like T5 to enable zero-shot machine translation for ELRLs. In this talk, I will present two noise augmentation-based approaches to facilitate machine translation from ELRLs to English. We leverage the lexical similarity between high-resource languages and closely related ELRLs and propose both random and linguistically inspired noise augmentation techniques. Experiments conducted across three diverse language groups indicate significant performance improvements. We hope these advancements will help bridge communication gaps in underserved communities and foster greater linguistic inclusivity.
Biography:
Kaushal Kumar Maurya is a Post-Doctoral Research Associate in Artificial Intelligence at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE, specializing in Educational NLP. He earned his PhD from the Indian Institute of Technology (IIT) Hyderabad, where his research focused on low-resource NLP. He has collaborated with Microsoft's Translate and Auto-Suggest teams and has several publications in top *CL conferences. He is also a recipient of the prestigious Suzuki Foundation Fellowship, the Microsoft Academic Grant, and the Google Academic Grant.