back
Get SIGNAL/NOISE in your inbox daily

Former Cloudflare executive John Graham-Cumming has launched lowbackgroundsteel.ai, a catalog that preserves pre-2022 human-generated content from before widespread AI contamination began. The archive draws its name from scientists who once sought “low-background steel” from pre-nuclear shipwrecks to avoid radiation contamination, creating a parallel between nuclear fallout and AI-generated content polluting the internet.

The big picture: The project treats pre-AI content as a precious commodity, recognizing that distinguishing between human and machine-generated material has become increasingly difficult since ChatGPT’s November 2022 launch.

Why this matters: AI contamination has already forced at least one major research project to shut down entirely—wordfreq, a Python library that tracked word frequency across 40+ languages, announced in September 2024 it would stop updating because “the Web at large is full of slop generated by large language models, written by no one to communicate nothing.”

What’s included: The archive points to several major repositories of verified pre-AI content that researchers and developers can trust.

  • A Wikipedia dump from August 2022, captured before ChatGPT’s release.
  • Project Gutenberg’s collection of public domain books.
  • The Library of Congress photo archive.
  • GitHub’s Arctic Code Vault—open source code buried in a former coal mine near the North Pole in February 2020.
  • The now-frozen wordfreq project, preserved from before AI contamination made its methodology untenable.

Model collapse concerns: Some researchers worry about AI models training on their own outputs, potentially degrading quality over time, though recent evidence suggests this fear may be overblown under certain conditions.

  • Research by Gerstgrasser et al. (2024) indicates model collapse can be avoided when synthetic data accumulates alongside real data rather than replacing it entirely.
  • Properly curated synthetic data can actually assist with training newer, more capable models when combined with real data.

The backstory: Graham-Cumming created the website in March 2023 but only recently announced it publicly, having kept it as a quiet clearinghouse for uncontaminated online resources.

  • He’s known for creating POPFile spam filtering software and successfully petitioning the UK government to apologize for persecuting codebreaker Alan Turing in 2009.
  • The site accepts new submissions through its Tumblr page.

Looking ahead: Graham-Cumming emphasizes the project documents human creativity rather than opposing AI itself, similar to how low-background steel eventually became unnecessary as atmospheric nuclear testing ended and radiation levels normalized.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...