Open Source AI Definition reaches release candidate stage: The Open Source Initiative (OSI) has released a Release Candidate (RC1) version of the Open Source AI Definition, marking a significant milestone in defining open-source standards for artificial intelligence systems.
- The RC1 version incorporates extensive community feedback gathered through town hall meetings, forum discussions, and in-person conversations across multiple countries.
- This release focuses on refining the definition of the “preferred form to make modifications to a machine learning system,” addressing key aspects of data sharing, code completeness, and legal considerations.
Key updates in the Release Candidate:
-
Data Information requirements:
- The definition now clarifies that all training data must be shared and disclosed to the extent permitted by law.
- Four types of data are described: open, public, obtainable, and unshareable, each with different legal requirements for sharing.
-
Code completeness:
- RC1 emphasizes that the provided code must be comprehensive enough for downstream recipients to understand the training process.
- This requirement aims to enhance transparency, security, and the ability to meaningfully fork AI systems.
-
Copyleft-like terms:
- The new text explicitly allows for copyleft-like terms to be applied to Code, Data Information, and Parameters, either individually or as bundled combinations.
- This provision anticipates potential scenarios where consortiums might distribute code and data bundles with specific legal terms.
Clarifying the role of Open Source in AI:
Scientific reproducibility and Open Source: The OSI emphasizes that the primary goal of Open Source AI is not scientific reproducibility, but rather to enable meaningful forking of AI systems.
- Open Source aims to provide the ability to study and modify systems without additional permissions, fostering innovation and improvement.
- While Open Source does not impede reproducibility, it does not explicitly require it; additional requirements can be added on top of Open Source principles to achieve reproducibility.
Forking in machine learning context:
- In the AI realm, forking refers to the ability to build systems that behave differently from their original state.
- This capability allows for various improvements, such as fixing security issues, enhancing behavior, and removing bias.
Next steps in the definition process:
Focus on refinement and documentation: With the release candidate phase underway, the OSI will shift its attention to:
- Addressing any major flaws or issues raised during the RC period.
- Refining accompanying documentation, including the Checklist and FAQ.
- Clarifying the basic requirement that data must be shared if legally possible.
Preparation for official release:
- The OSI aims to gather more endorsers for the Definition.
- Continued collection of feedback through various channels.
- Preparation of launch materials for the official release at the All Things Open event on October 28.
Broader implications: The Open Source AI Definition represents a crucial step in establishing standards for transparency and collaboration in AI development. As AI systems become increasingly influential in various sectors, this definition could play a pivotal role in shaping the future of open and accessible AI technologies.
The Open Source AI Definition RC1 is available for comments