AWS Probes AI Startup Perplexity Over Alleged Data Scraping Violations

Amazon investigates Perplexity AI over potential data-scraping violations: Amazon Web Services is looking into whether AI startup Perplexity is violating its terms of service by scraping web content without permission, following reports from multiple news outlets.

Accusations of improper data scraping: Several publications, including Forbes and Wired, have accused Perplexity of swiping their web archives to train its AI models without consent or compensation:

Forbes alleged that Perplexity is creating “knockoff stories” using similar wording and lifted fragments from its articles without adequate citation.
Wired identified an IP address it believes Perplexity is using to crawl its sites and those of its parent company, Condé Nast, in violation of the robots.txt standard.
The Guardian, Forbes, and The New York Times also told Wired they have seen the same IP address on their servers.

AWS investigating Perplexity’s practices: An AWS representative confirmed that Amazon is investigating whether Perplexity is breaking its rules, which prohibit using AWS services for any illegal activity:

All AWS clients must follow the instructions in websites’ robots.txt files, which typically disallow bots from scraping data.
However, Perplexity claims it is following the rules and that AWS is not looking into the startup beyond the initial Wired report.

Broader tensions over AI firms scraping web content: The Perplexity investigation highlights growing backlash against tech companies training AI models on web data without explicit permission:

Microsoft’s AI chief recently claimed any “open web” content is “fair use” for AI firms to scrape and monetize, sparking debate over the ethics of this practice.
The New York Times is suing OpenAI and Microsoft for alleged copyright infringement by pulling from its articles to train their AI without consent.
Some outlets like Semafor and TIME have proactively licensed their content to AI companies, while others are fighting back against nonconsensual scraping.

Perplexity’s ambitions and industry connections: Despite the controversy, Perplexity has positioned itself as a potential Google competitor with backing from major tech players:

The startup aims to offer an AI-powered “answer engine” and is supported by Jeff Bezos’ investment fund and Nvidia.
However, the similarities between some of Perplexity’s and Google’s search results have raised questions about the true extent of its innovation.

Analyzing deeper: The Perplexity investigation underscores the complex dynamics around AI and web scraping, with tech giants, startups, and content creators jockeying for control over valuable training data. As lawmakers and the public scrutinize these practices more closely, clearer regulations may be needed to balance AI innovation with intellectual property rights and content ownership. In the meantime, the outcome of Amazon’s investigation could set an important precedent for Perplexity and other AI firms relying on web data to fuel their models.

AWS Probes AI Startup Perplexity Over Alleged Data Scraping Violations

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development