×
Microsoft AI Boss: Publishing Online Makes Content Free for AI to Copy
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft’s AI boss Mustafa Suleyman claims publishing content on the open web makes it “fair use” for anyone to freely copy and use, sparking controversy amid ongoing lawsuits against Microsoft and OpenAI over alleged copyright infringement in training AI models.

Key misconceptions about copyright law: Suleyman’s statements reveal a flawed understanding of how copyright and fair use operate on the internet:

  • He incorrectly asserts that publishing content online automatically makes it “freeware” that anyone can copy and use, despite copyright protection applying automatically to original works upon creation.
  • Suleyman mistakenly claims a “social contract” grants fair use for web content, when in reality, fair use is a legal defense determined case-by-case in court based on specific factors like the purpose and amount of copying.

AI companies’ controversial stance on copyrighted data: Microsoft’s position reflects a broader trend of AI companies arguing training models on copyrighted material is fair use, even as they face growing legal challenges:

  • Several lawsuits allege Microsoft and OpenAI are infringing copyrights by scraping online content to train AI without permission or compensation to creators.
  • While many AI firms claim fair use protects this practice, the unprecedented nature of generative AI means the legal precedents are unclear and will likely be determined through ongoing court battles.

Disregarding established web conventions: Beyond the legal questions, Suleyman’s comments highlight how some AI companies are ignoring or misrepresenting long-standing norms around web scraping:

  • He suggests the robots.txt standard, which allows sites to specify rules for web crawlers, might provide a “grey area” for copying content, despite it being an informal convention, not a legally binding document.
  • Reports indicate OpenAI and others have scraped sites while disregarding their robots.txt files entirely, breaching this “social contract” the tech industry has generally respected since the early web.

Broader implications for online content and AI: As generative AI rapidly advances, Suleyman’s statements exemplify the urgent need to clarify the legal and ethical boundaries around using copyrighted data to train these systems:

  • With AI firms incentivized to hoover up as much training data as possible, a permissive approach to copyright could lead to widescale appropriation of creative works to fuel AI development.
  • Allowing AI models to freely copy online content may undermine creators’ livelihoods and erode incentives to publish original material on the open web in the first place.
  • Establishing clearer rules and norms will be crucial to strike a balance between enabling AI innovation and respecting intellectual property rights in this new technological landscape.
Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Recent News

‘Agent orchestration’ is the backbone of business ops in the AI era — here’s why

Agent orchestration leverages AI to actively manage interactions and optimize data flow across enterprise systems, promising more responsive and adaptive business environments.

This startup is using AI to help patients decode their X-rays

AI-powered dental imaging system enhances X-rays to improve patient understanding and treatment decisions.

MIT’s latest breakthrough is tiny, but it has big implications for the semiconductor industry

The novel 3D nanoscale transistor design could overcome silicon's physical limitations, potentially leading to more efficient and powerful electronic devices.