×
ULMFit, not GPT-1, was the first true LLM according to new analysis
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The development of Large Language Models (LLMs) has fundamentally transformed AI capabilities, but understanding their origins helps contextualize today’s rapid advancements. While GPT-4 and Claude dominate current discussions, identifying the first true LLM clarifies the evolutionary path of these increasingly sophisticated systems and provides valuable perspective on how quickly this technology has developed in just a few years.

The big picture: According to Australian tech blogger Jonathon Belotti, ULMFit, published by Jeremy Howard in January 2018, represents the first true LLM, predating OpenAI’s GPT-1 by several months.

  • GPT-1, created by Alec Radford, was published on June 11, 2018, several months after ULMFit’s introduction.
  • Both models demonstrated the core capabilities that define modern LLMs, though at a far smaller scale than today’s systems.

What makes an LLM: Belotti defines an LLM as a language model effectively trained as a “next word predictor” that can be easily adapted to multiple text-based tasks without architectural changes.

  • The definition emphasizes self-supervised training on unlabeled text data, focusing on next-word prediction capabilities.
  • True LLMs must achieve state-of-the-art performance across multiple text challenges with minimal adaptation.
  • This definition helps distinguish early language models from true LLMs based on their capabilities and adaptability.

Historical context: The article examines several pre-2018 models like CoVE and ELMo to determine if they meet the criteria for being considered LLMs.

  • After analysis, Belotti concludes that while arguable, ULMFit most convincingly fits the definition of the first genuine LLM.
  • Earlier models lacked either the adaptability or performance characteristics that define modern LLMs.

Where we go from here: Despite the increasing multimodality of AI models, Belotti suggests the term “LLM” will likely persist in technical vernacular.

  • Like “GPU” (which originally stood for Graphics Processing Unit but now handles many non-graphics tasks), “LLM” may become a standard term even as models evolve beyond pure language processing.

Recent News

Unpublished AI system allegedly stolen by synthetic researcher on GitHub

The repository allegedly contains an unpublished recursive AI system architecture with suspicious backdated commits and connection to a potentially synthetic researcher identity with falsified credentials.

The need for personal AI defenders in a world of manipulative AI

Advanced AI systems that protect users from digital manipulation are emerging as essential counterparts to the business-deployed agents that increasingly influence consumer decisions and behavior.

AI excels at identifying geographical locations but struggles with objects in retro games

Modern AI systems show paradoxical visual skills, excelling at complex geographic identification while struggling with simple pixel-based game objects.