Evolution of Language Technology: From Bag of Words to Generative AI
Abhijit Choudhary, Seasoned Data Scientist, ADP
Since 2022, the Generative AI revolution has taken the world by storm with its immense potential and promises. At the heart of this transformation are large language models based on transformer architecture—particularly auto-regressive, pre-trained generative models like GPT. But this revolution didn’t happen overnight. It is the result of decades of iterative progress in the broader field of language technology, which consistently sought ways to make human-readable text understandable to machines through a series of increasingly sophisticated techniques.
This presentation offers a retrospective techno-functional overview of that evolution, spotlighting the key milestones that shaped the journey. Back in the 1970s, techniques like TF-IDF provided basic statistical relevance scores based on word frequency. While not true embeddings, they offered foundational document-specific representations. A significant leap occurred around 2011 with the rise of neural language models, eventually leading to the development of global, pre-trained word embeddings like Word2Vec (2013) and GloVe (2014), which captured semantic relationships and marked a new era.
The year 2018 stands out as an inflection point, with the arrival of BERT from Google and GPT from OpenAI—two transformative models leveraging the attention mechanism of transformers. BERT set new benchmarks in natural language understanding, excelling at sentence-level tasks with bidirectional context. GPT, on the other hand, opened doors to generative capabilities, surprising the world with its ability to produce human-like text, although early versions faced issues like repetitive output.
The journey of GPT since then has focused on overcoming these limitations through advancements in decoding strategies, such as top-k and nucleus sampling, significantly improving fluency and diversity in generated outputs. This talk draws on landmark research papers and modeling strategies that have defined this multi-decade evolution, offering attendees an engaging tour of how language technology has progressed from basic keyword counting to the generative intelligence we see today.
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar
Loading virtual attendance info...
- Contact Event Hosts
-
Zoom Meeting#: https://fdu.zoom.us/j/4522947007
Address: Becton Hall- 401E, Fairleigh Dickinson University, 960 River Road, Teaneck, NJ 07666.
- Co-sponsored by Avimanyou Vatsa
Speakers
Abhijit Choudhary of ADP
Evolution of Language Technology: From Bag of Words to Generative AI
Since 2022, the Generative AI revolution has taken the world by storm with its immense potential and promises. At the heart of this transformation are large language models based on transformer architecture—particularly auto-regressive, pre-trained generative models like GPT. But this revolution didn’t happen overnight. It is the result of decades of iterative progress in the broader field of language technology, which consistently sought ways to make human-readable text understandable to machines through a series of increasingly sophisticated techniques.
This presentation offers a retrospective techno-functional overview of that evolution, spotlighting the key milestones that shaped the journey. Back in the 1970s, techniques like TF-IDF provided basic statistical relevance scores based on word frequency. While not true embeddings, they offered foundational document-specific representations. A significant leap occurred around 2011 with the rise of neural language models, eventually leading to the development of global, pre-trained word embeddings like Word2Vec (2013) and GloVe (2014), which captured semantic relationships and marked a new era.
The year 2018 stands out as an inflection point, with the arrival of BERT from Google and GPT from OpenAI—two transformative models leveraging the attention mechanism of transformers. BERT set new benchmarks in natural language understanding, excelling at sentence-level tasks with bidirectional context. GPT, on the other hand, opened doors to generative capabilities, surprising the world with its ability to produce human-like text, although early versions faced issues like repetitive output.
Biography:
Abhijit Choudhary is a seasoned Data Scientist with 20 years of experience in the software industry across multiple generations of data science—from classical statistical methods to transformer-based architectures, with a strong focus on NLP, machine learning, and AI-driven automation. Since the emergence of BERT, which marked the beginning of the transformer era in 2018, he has worked extensively on real-world NLP, NLU, and Generative AI projects, gaining deep hands-on experience with their evolution, promise, and limitations.
He is currently a Senior Data Scientist at ADP, where he has played a pivotal role across several AI initiatives in various functional areas of payroll automation—delivering high-impact solutions at enterprise scale. Prior to ADP, Abhijit played critical roles in several greenfield, data science–driven projects for leading clients, including metadata smart search at J.P. Morgan, an early warning system at Salesforce, credit and business strategy analytics at Capital One, and loss mitigation strategy and risk/compliance for Truist.
In addition to his product-focused data science experience, Abhijit brings a strong consulting background, having worked with Accenture Technology Consulting and Infosys Management Consulting Services—building trusted partnerships and solving complex problems across domains for leading clients across the USA, Singapore, and India.
Email:
Address:ADP, , Parsippany, New Jersey, United States