Student's AI model accidentally reconstructs real 1834 London protests through adjacent historical data

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

A computer science student at Muhlenberg College accidentally discovered his AI model trained on Victorian-era texts could accurately reference real historical events from 1834 London, including protests related to Lord Palmerston’s actions. Hayk Grigorian’s TimeCapsuleLLM reconstructed these historical connections from scattered references across thousands of documents without being explicitly taught about these specific events, demonstrating how AI models can synthesize factual information from ambient patterns in training data.

What you should know: Grigorian has been developing TimeCapsuleLLM over the past month, training it exclusively on texts from 1800-1875 London to capture an authentic Victorian voice.

When prompted with “It was the year of our Lord 1834,” the AI generated text mentioning “streets of London were filled with protest and petition” and referenced Lord Palmerston.
Fact-checking revealed that 1834 did see significant civil unrest in England following the Poor Law Amendment Act, and Palmerston served as Foreign Secretary during this period.
The model assembled these historical connections from 6.25GB of Victorian-era writing without intentional training on 1834 protest documentation.

How it works: Grigorian uses what he calls “Selective Temporal Training” (STT), training models from scratch using exclusively Victorian-era sources rather than fine-tuning modern AI models.

His dataset includes over 7,000 books, legal documents, and newspapers published in London between 1800 and 1875.
A custom tokenizer excludes modern vocabulary entirely to prevent contamination from contemporary knowledge.
“If I fine-tune something like GPT-2, it’s already pre-trained and that information won’t go away,” Grigorian explained. “If I train from scratch the language model won’t pretend to be old, it just will be.”

The evolution: Grigorian has trained three versions showing improved historical coherence as data size increased.

Version 0, trained on just 187MB, produced “Victorian-flavored gibberish.”
Version 0.5 achieved grammatically correct period prose but hallucinated facts.
The current 700-million parameter version, trained on a rented A100 GPU, has begun generating accurate historical references.

The bigger picture: TimeCapsuleLLM joins a growing field of Historical Large Language Models (HLLMs) that researchers are developing to interact with past eras.

Similar projects include MonadGPT, trained on 11,000 texts from 1400-1700 CE, and XunziALLM, which generates classical Chinese poetry following ancient formal rules.
These models offer researchers opportunities to converse with simulated speakers of extinct vernaculars or historical languages.
For digital humanities researchers, such experiments could provide stylistically illuminating insights into antique syntax and vocabulary usage.

What they’re saying: Grigorian expressed amazement at his model’s capabilities despite its relatively small size.

“This is all from just 5-6GB of data,” he wrote on Reddit. “Imagine the results with 30GB or more. I’m not sure if just scaling the data up will ever result in reasoning but even now it kinda feels like digital time travel.”
He plans to expand the project: “I want to eventually try different cities also, maybe a Chinese, Russian, or Indian city model.”

Why this matters: The experiment demonstrates how AI models can accidentally reconstruct factual historical information from pattern recognition, offering a counterpoint to typical AI hallucination concerns.

The model’s ability to connect disparate historical elements suggests potential applications for historical research and education.
Grigorian makes his code, AI model weights, and documentation publicly available on GitHub, enabling collaboration and further research.
As one observer noted, it represents “almost the opposite of hallucination—an AI model accidentally getting something correct. Call it a ‘factcident.'”

AI built from 1800s texts surprises creator by mentioning real 1834 London protests

Ars Technica

Menu

Student’s AI model accidentally reconstructs real 1834 London protests through adjacent historical data

Recent News

SITE BEING UPDATED. PLEASE STAY TUNED.

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Student’s AI model accidentally reconstructs real 1834 London protests through adjacent historical data

Recent News

SITE BEING UPDATED. PLEASE STAY TUNED.

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

Join the revolution

CO/AI

Resources

Join the revolution