×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

A new analysis of 40 large language models (LLMs) claiming to be “open source” found significant discrepancies in the level of openness provided by different developers, with smaller players generally being more transparent than tech giants:

Key Takeaways: Researchers Mark Dingemanse and Andreas Liesenfeld created an openness league table assessing models on 14 parameters, revealing that many models described as “open source” fail to disclose important information about the underlying technology:

  • Around half of the analyzed models do not provide any details about training datasets beyond generic descriptions.
  • Truly open-source models should allow outside researchers to inspect, customize, and reproduce them, but many only provide limited access.
  • Scientific papers detailing the models are extremely rare, often replaced by less rigorous blog posts or preprints low on technical details.

Openness Ranking: The study highlighted a clear divide between the openness practices of smaller players and those of big tech companies:

  • Models from AI research groups and smaller firms, such as BLOOM and OLMo, topped the openness ranking by providing access to code, training data, model weights, and detailed documentation.
  • Big tech giants like Google, Meta, and Microsoft scored lower, often restricting access to key components while still claiming openness and reaping the associated benefits.
  • Google labels its LLM as “open” rather than “open source,” arguing that “existing open-source concepts can’t always be directly applied to AI systems.”

Importance for Science: The authors emphasize that openness is crucial for scientific progress and accountability in AI development:

  • Reproducibility, a cornerstone of science, requires enough information for researchers to independently build and study the models.
  • Lack of data transparency raises concerns about the potential use of inappropriate or copyrighted material in training datasets.
  • Models must be open to scrutiny to assess their true capabilities and limitations, as impressive performance could result from extensive task-specific training.

Regulatory Implications: The findings are particularly relevant as the European Union’s upcoming AI Act is set to grant open-source general-purpose models certain exemptions from transparency requirements:

  • The lack of an agreed definition for “open source AI” leaves room for interpretation and potential “open-washing” by companies seeking to benefit from the label.
  • The new legal significance of the term “open source” in the EU could make it a focal point for corporate lobbying efforts aiming to minimize disclosure obligations.

Reading Between the Lines: While the study provides valuable insights into the current state of openness in the AI industry, it also highlights the need for clearer standards and accountability mechanisms:

  • As AI systems become more complex and impactful, ensuring meaningful transparency from developers is crucial for fostering trust, enabling scientific progress, and informing effective regulation.
  • The tendency of major tech players to claim openness while restricting access to key components underscores the importance of independent auditing and validation efforts.
  • Policymakers and the AI research community must work together to establish clear, enforceable guidelines for open-source AI that prioritize the public interest and prevent “open-washing.”

Broader Implications: The study’s findings underscore the growing tension between the competitive dynamics of the AI industry and the principles of openness and transparency in scientific research and technological development. As AI systems become increasingly powerful and integrated into society, striking the right balance between protecting commercial interests and ensuring public accountability will be a defining challenge for regulators, researchers, and companies alike. Efforts to establish clear standards for open-source AI and close the “openness gaps” revealed by this analysis will be crucial for realizing the technology’s full potential while mitigating its risks and ensuring that its development aligns with the values of transparency, reproducibility, and the greater good.

Recent News

AI’s Insatiable Need for Energy is Presenting Big Investment Opportunities

The rapid expansion of AI-driven data centers is straining US power infrastructure, requiring over $500 billion in investments and potentially consuming 12% of national electricity by 2030.

AI Tutors Double Student Learning in Harvard Study

Students using an AI tutor demonstrated twice the learning gains in half the time compared to traditional lectures, suggesting potential for more efficient and personalized education.

Lionsgate Teams Up With Runway On Custom AI Video Generation Model

The studio aims to develop AI tools for filmmakers using its vast library, raising questions about content creation and creative rights.