back
Get SIGNAL/NOISE in your inbox daily

A new analysis of 40 large language models (LLMs) claiming to be “open source” found significant discrepancies in the level of openness provided by different developers, with smaller players generally being more transparent than tech giants:

Key Takeaways: Researchers Mark Dingemanse and Andreas Liesenfeld created an openness league table assessing models on 14 parameters, revealing that many models described as “open source” fail to disclose important information about the underlying technology:

  • Around half of the analyzed models do not provide any details about training datasets beyond generic descriptions.
  • Truly open-source models should allow outside researchers to inspect, customize, and reproduce them, but many only provide limited access.
  • Scientific papers detailing the models are extremely rare, often replaced by less rigorous blog posts or preprints low on technical details.

Openness Ranking: The study highlighted a clear divide between the openness practices of smaller players and those of big tech companies:

  • Models from AI research groups and smaller firms, such as BLOOM and OLMo, topped the openness ranking by providing access to code, training data, model weights, and detailed documentation.
  • Big tech giants like Google, Meta, and Microsoft scored lower, often restricting access to key components while still claiming openness and reaping the associated benefits.
  • Google labels its LLM as “open” rather than “open source,” arguing that “existing open-source concepts can’t always be directly applied to AI systems.”

Importance for Science: The authors emphasize that openness is crucial for scientific progress and accountability in AI development:

  • Reproducibility, a cornerstone of science, requires enough information for researchers to independently build and study the models.
  • Lack of data transparency raises concerns about the potential use of inappropriate or copyrighted material in training datasets.
  • Models must be open to scrutiny to assess their true capabilities and limitations, as impressive performance could result from extensive task-specific training.

Regulatory Implications: The findings are particularly relevant as the European Union’s upcoming AI Act is set to grant open-source general-purpose models certain exemptions from transparency requirements:

  • The lack of an agreed definition for “open source AI” leaves room for interpretation and potential “open-washing” by companies seeking to benefit from the label.
  • The new legal significance of the term “open source” in the EU could make it a focal point for corporate lobbying efforts aiming to minimize disclosure obligations.

Reading Between the Lines: While the study provides valuable insights into the current state of openness in the AI industry, it also highlights the need for clearer standards and accountability mechanisms:

  • As AI systems become more complex and impactful, ensuring meaningful transparency from developers is crucial for fostering trust, enabling scientific progress, and informing effective regulation.
  • The tendency of major tech players to claim openness while restricting access to key components underscores the importance of independent auditing and validation efforts.
  • Policymakers and the AI research community must work together to establish clear, enforceable guidelines for open-source AI that prioritize the public interest and prevent “open-washing.”

Broader Implications: The study’s findings underscore the growing tension between the competitive dynamics of the AI industry and the principles of openness and transparency in scientific research and technological development. As AI systems become increasingly powerful and integrated into society, striking the right balance between protecting commercial interests and ensuring public accountability will be a defining challenge for regulators, researchers, and companies alike. Efforts to establish clear standards for open-source AI and close the “openness gaps” revealed by this analysis will be crucial for realizing the technology’s full potential while mitigating its risks and ensuring that its development aligns with the values of transparency, reproducibility, and the greater good.

Recent Stories

Oct 17, 2025

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...

Oct 17, 2025

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...

Oct 17, 2025

Vatican launches Latin American AI network for human development

The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...