×
ULMFit, not GPT-1, was the first true LLM according to new analysis
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The development of Large Language Models (LLMs) has fundamentally transformed AI capabilities, but understanding their origins helps contextualize today’s rapid advancements. While GPT-4 and Claude dominate current discussions, identifying the first true LLM clarifies the evolutionary path of these increasingly sophisticated systems and provides valuable perspective on how quickly this technology has developed in just a few years.

The big picture: According to Australian tech blogger Jonathon Belotti, ULMFit, published by Jeremy Howard in January 2018, represents the first true LLM, predating OpenAI’s GPT-1 by several months.

  • GPT-1, created by Alec Radford, was published on June 11, 2018, several months after ULMFit’s introduction.
  • Both models demonstrated the core capabilities that define modern LLMs, though at a far smaller scale than today’s systems.

What makes an LLM: Belotti defines an LLM as a language model effectively trained as a “next word predictor” that can be easily adapted to multiple text-based tasks without architectural changes.

  • The definition emphasizes self-supervised training on unlabeled text data, focusing on next-word prediction capabilities.
  • True LLMs must achieve state-of-the-art performance across multiple text challenges with minimal adaptation.
  • This definition helps distinguish early language models from true LLMs based on their capabilities and adaptability.

Historical context: The article examines several pre-2018 models like CoVE and ELMo to determine if they meet the criteria for being considered LLMs.

  • After analysis, Belotti concludes that while arguable, ULMFit most convincingly fits the definition of the first genuine LLM.
  • Earlier models lacked either the adaptability or performance characteristics that define modern LLMs.

Where we go from here: Despite the increasing multimodality of AI models, Belotti suggests the term “LLM” will likely persist in technical vernacular.

  • Like “GPU” (which originally stood for Graphics Processing Unit but now handles many non-graphics tasks), “LLM” may become a standard term even as models evolve beyond pure language processing.

Recent News

ChatGPT’s image generator now creates public figures and controversial content

The AI company now allows depictions of celebrities and controversial imagery, shifting from pre-emptive protection to an opt-out approach for public figures.

Leak exposes 95,000 AI-generated explicit images, including child abuse material

Database breach reveals widespread use of AI tools to create non-consensual sexual imagery of real people, including minors and celebrities.

New AI architectures hide reasoning processes, raising transparency concerns

New AI models perform reasoning in hidden "latent" spaces, making it harder for researchers to detect flawed or deceptive thinking processes.