×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft’s AI boss Mustafa Suleyman claims publishing content on the open web makes it “fair use” for anyone to freely copy and use, sparking controversy amid ongoing lawsuits against Microsoft and OpenAI over alleged copyright infringement in training AI models.

Key misconceptions about copyright law: Suleyman’s statements reveal a flawed understanding of how copyright and fair use operate on the internet:

  • He incorrectly asserts that publishing content online automatically makes it “freeware” that anyone can copy and use, despite copyright protection applying automatically to original works upon creation.
  • Suleyman mistakenly claims a “social contract” grants fair use for web content, when in reality, fair use is a legal defense determined case-by-case in court based on specific factors like the purpose and amount of copying.

AI companies’ controversial stance on copyrighted data: Microsoft’s position reflects a broader trend of AI companies arguing training models on copyrighted material is fair use, even as they face growing legal challenges:

  • Several lawsuits allege Microsoft and OpenAI are infringing copyrights by scraping online content to train AI without permission or compensation to creators.
  • While many AI firms claim fair use protects this practice, the unprecedented nature of generative AI means the legal precedents are unclear and will likely be determined through ongoing court battles.

Disregarding established web conventions: Beyond the legal questions, Suleyman’s comments highlight how some AI companies are ignoring or misrepresenting long-standing norms around web scraping:

  • He suggests the robots.txt standard, which allows sites to specify rules for web crawlers, might provide a “grey area” for copying content, despite it being an informal convention, not a legally binding document.
  • Reports indicate OpenAI and others have scraped sites while disregarding their robots.txt files entirely, breaching this “social contract” the tech industry has generally respected since the early web.

Broader implications for online content and AI: As generative AI rapidly advances, Suleyman’s statements exemplify the urgent need to clarify the legal and ethical boundaries around using copyrighted data to train these systems:

  • With AI firms incentivized to hoover up as much training data as possible, a permissive approach to copyright could lead to widescale appropriation of creative works to fuel AI development.
  • Allowing AI models to freely copy online content may undermine creators’ livelihoods and erode incentives to publish original material on the open web in the first place.
  • Establishing clearer rules and norms will be crucial to strike a balance between enabling AI innovation and respecting intellectual property rights in this new technological landscape.
Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Recent News

AI doomer Gary Marcus says this is why AI won’t 10X coding productivity

Recent studies reveal that AI's impact on coding productivity falls short of inflated predictions, with modest gains and potential drawbacks observed in real-world applications.

Smart glasses are still the next big thing in tech — because of AI

Meta's Orion prototype showcases advanced AR capabilities, but widespread adoption of smart glasses faces technological and social hurdles.

DroneDeploy launches ‘Safety AI’ to protect against construction site hazards

The AI-powered tool analyzes drone imagery to identify and prioritize safety risks on construction sites, aiming to reduce accidents and associated costs.