×
Written by
Published on
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Microsoft’s AI boss Mustafa Suleyman claims publishing content on the open web makes it “fair use” for anyone to freely copy and use, sparking controversy amid ongoing lawsuits against Microsoft and OpenAI over alleged copyright infringement in training AI models.

Key misconceptions about copyright law: Suleyman’s statements reveal a flawed understanding of how copyright and fair use operate on the internet:

  • He incorrectly asserts that publishing content online automatically makes it “freeware” that anyone can copy and use, despite copyright protection applying automatically to original works upon creation.
  • Suleyman mistakenly claims a “social contract” grants fair use for web content, when in reality, fair use is a legal defense determined case-by-case in court based on specific factors like the purpose and amount of copying.

AI companies’ controversial stance on copyrighted data: Microsoft’s position reflects a broader trend of AI companies arguing training models on copyrighted material is fair use, even as they face growing legal challenges:

  • Several lawsuits allege Microsoft and OpenAI are infringing copyrights by scraping online content to train AI without permission or compensation to creators.
  • While many AI firms claim fair use protects this practice, the unprecedented nature of generative AI means the legal precedents are unclear and will likely be determined through ongoing court battles.

Disregarding established web conventions: Beyond the legal questions, Suleyman’s comments highlight how some AI companies are ignoring or misrepresenting long-standing norms around web scraping:

  • He suggests the robots.txt standard, which allows sites to specify rules for web crawlers, might provide a “grey area” for copying content, despite it being an informal convention, not a legally binding document.
  • Reports indicate OpenAI and others have scraped sites while disregarding their robots.txt files entirely, breaching this “social contract” the tech industry has generally respected since the early web.

Broader implications for online content and AI: As generative AI rapidly advances, Suleyman’s statements exemplify the urgent need to clarify the legal and ethical boundaries around using copyrighted data to train these systems:

  • With AI firms incentivized to hoover up as much training data as possible, a permissive approach to copyright could lead to widescale appropriation of creative works to fuel AI development.
  • Allowing AI models to freely copy online content may undermine creators’ livelihoods and erode incentives to publish original material on the open web in the first place.
  • Establishing clearer rules and norms will be crucial to strike a balance between enabling AI innovation and respecting intellectual property rights in this new technological landscape.
Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Recent News

AI video generator Pika 1.5 brings imagination to life

The new model offers lifelike movements, enhanced physics, and advanced camera techniques, making high-quality video creation accessible to users of all skill levels.

YouTuber claims AI company stole his voice for chatbot

Ethical concerns, leadership changes, and financial hurdles take center stage as the AI industry grapples with rapid growth and evolving challenges.

AI video creation transformed by Kling’s new lip syncing feature

Kling's new lip sync feature for AI-generated videos offers unprecedented accuracy, even for faces not directly facing the camera, potentially enabling individual creators to produce entire AI-driven productions with dialogue.