Amazon investigates Perplexity AI over potential data-scraping violations: Amazon Web Services is looking into whether AI startup Perplexity is violating its terms of service by scraping web content without permission, following reports from multiple news outlets.
Accusations of improper data scraping: Several publications, including Forbes and Wired, have accused Perplexity of swiping their web archives to train its AI models without consent or compensation:
- Forbes alleged that Perplexity is creating “knockoff stories” using similar wording and lifted fragments from its articles without adequate citation.
- Wired identified an IP address it believes Perplexity is using to crawl its sites and those of its parent company, Condé Nast, in violation of the robots.txt standard.
- The Guardian, Forbes, and The New York Times also told Wired they have seen the same IP address on their servers.
AWS investigating Perplexity’s practices: An AWS representative confirmed that Amazon is investigating whether Perplexity is breaking its rules, which prohibit using AWS services for any illegal activity:
- All AWS clients must follow the instructions in websites’ robots.txt files, which typically disallow bots from scraping data.
- However, Perplexity claims it is following the rules and that AWS is not looking into the startup beyond the initial Wired report.
Broader tensions over AI firms scraping web content: The Perplexity investigation highlights growing backlash against tech companies training AI models on web data without explicit permission:
- Microsoft’s AI chief recently claimed any “open web” content is “fair use” for AI firms to scrape and monetize, sparking debate over the ethics of this practice.
- The New York Times is suing OpenAI and Microsoft for alleged copyright infringement by pulling from its articles to train their AI without consent.
- Some outlets like Semafor and TIME have proactively licensed their content to AI companies, while others are fighting back against nonconsensual scraping.
Perplexity’s ambitions and industry connections: Despite the controversy, Perplexity has positioned itself as a potential Google competitor with backing from major tech players:
- The startup aims to offer an AI-powered “answer engine” and is supported by Jeff Bezos’ investment fund and Nvidia.
- However, the similarities between some of Perplexity’s and Google’s search results have raised questions about the true extent of its innovation.
Analyzing deeper: The Perplexity investigation underscores the complex dynamics around AI and web scraping, with tech giants, startups, and content creators jockeying for control over valuable training data. As lawmakers and the public scrutinize these practices more closely, clearer regulations may be needed to balance AI innovation with intellectual property rights and content ownership. In the meantime, the outcome of Amazon’s investigation could set an important precedent for Perplexity and other AI firms relying on web data to fuel their models.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...