×
‘Open-source’ has an updated definition — here’s what it is for now
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Open Source AI Definition reaches release candidate stage: The Open Source Initiative (OSI) has released a Release Candidate (RC1) version of the Open Source AI Definition, marking a significant milestone in defining open-source standards for artificial intelligence systems.

  • The RC1 version incorporates extensive community feedback gathered through town hall meetings, forum discussions, and in-person conversations across multiple countries.
  • This release focuses on refining the definition of the “preferred form to make modifications to a machine learning system,” addressing key aspects of data sharing, code completeness, and legal considerations.

Key updates in the Release Candidate:

  1. Data Information requirements:

    • The definition now clarifies that all training data must be shared and disclosed to the extent permitted by law.
    • Four types of data are described: open, public, obtainable, and unshareable, each with different legal requirements for sharing.
  2. Code completeness:

    • RC1 emphasizes that the provided code must be comprehensive enough for downstream recipients to understand the training process.
    • This requirement aims to enhance transparency, security, and the ability to meaningfully fork AI systems.
  3. Copyleft-like terms:

  • The new text explicitly allows for copyleft-like terms to be applied to Code, Data Information, and Parameters, either individually or as bundled combinations.
  • This provision anticipates potential scenarios where consortiums might distribute code and data bundles with specific legal terms.

Clarifying the role of Open Source in AI:

Scientific reproducibility and Open Source: The OSI emphasizes that the primary goal of Open Source AI is not scientific reproducibility, but rather to enable meaningful forking of AI systems.

  • Open Source aims to provide the ability to study and modify systems without additional permissions, fostering innovation and improvement.
  • While Open Source does not impede reproducibility, it does not explicitly require it; additional requirements can be added on top of Open Source principles to achieve reproducibility.

Forking in machine learning context:

  • In the AI realm, forking refers to the ability to build systems that behave differently from their original state.
  • This capability allows for various improvements, such as fixing security issues, enhancing behavior, and removing bias.

Next steps in the definition process:

Focus on refinement and documentation: With the release candidate phase underway, the OSI will shift its attention to:

  • Addressing any major flaws or issues raised during the RC period.
  • Refining accompanying documentation, including the Checklist and FAQ.
  • Clarifying the basic requirement that data must be shared if legally possible.

Preparation for official release:

  • The OSI aims to gather more endorsers for the Definition.
  • Continued collection of feedback through various channels.
  • Preparation of launch materials for the official release at the All Things Open event on October 28.

Broader implications: The Open Source AI Definition represents a crucial step in establishing standards for transparency and collaboration in AI development. As AI systems become increasingly influential in various sectors, this definition could play a pivotal role in shaping the future of open and accessible AI technologies.

The Open Source AI Definition RC1 is available for comments

Recent News

Grok stands alone as X restricts AI training on posts in new policy update

X explicitly bans third-party AI companies from using tweets for model training while still preserving access for its own Grok AI.

Coming out of the dark: Shadow AI usage surges in enterprise IT

IT leaders report 90% concern over unauthorized AI tools, with most organizations already suffering negative consequences including data leaks and financial losses.

Anthropic CEO opposes 10-year AI regulation ban in NYT op-ed

As AI capabilities rapidly accelerate, Anthropic's chief executive argues for targeted federal transparency standards rather than blocking state-level regulation for a decade.