The development of Large Language Models (LLMs) has fundamentally transformed AI capabilities, but understanding their origins helps contextualize today’s rapid advancements. While GPT-4 and Claude dominate current discussions, identifying the first true LLM clarifies the evolutionary path of these increasingly sophisticated systems and provides valuable perspective on how quickly this technology has developed in just a few years.
The big picture: According to Australian tech blogger Jonathon Belotti, ULMFit, published by Jeremy Howard in January 2018, represents the first true LLM, predating OpenAI’s GPT-1 by several months.
What makes an LLM: Belotti defines an LLM as a language model effectively trained as a “next word predictor” that can be easily adapted to multiple text-based tasks without architectural changes.
Historical context: The article examines several pre-2018 models like CoVE and ELMo to determine if they meet the criteria for being considered LLMs.
Where we go from here: Despite the increasing multimodality of AI models, Belotti suggests the term “LLM” will likely persist in technical vernacular.