Meta’s extensive data mining for AI training revealed: Meta has admitted to scraping all public posts from Facebook and Instagram since 2007 to train its generative AI model, raising significant privacy concerns and contrasting sharply with Apple’s approach to AI development.
Scale of data collection: Meta’s admission covers a vast trove of user-generated content spanning nearly two decades, encompassing billions of posts, photos, and comments from its social media platforms.
- The data collection includes all public posts made on Facebook and Instagram since 2007, excluding only content from users under 18 and private accounts.
- Photos of children posted by parents on public accounts were also included in the data scraping.
- This revelation contradicts the company’s earlier statements, which presented plans to use user data for AI training as a new initiative.
Privacy implications and user consent: The extensive data mining raises serious questions about user privacy and the extent of consent given for such use of personal information.
- Meta did not offer opt-out options for users in most countries, including the United States and Australia.
- In the European Union and United Kingdom, where legal requirements necessitated opt-out options, the legality of making it opt-out rather than opt-in is under investigation.
- The company’s approach contrasts sharply with more privacy-focused AI development strategies, such as those employed by Apple.
Regulatory scrutiny and company response: Meta’s practices have come under increased scrutiny from regulators and lawmakers worldwide.
- The admission was made during a public inquiry in Australia, led by Labor senator Tony Sheldon and Greens senator David Shoebridge.
- Initially, Meta’s global privacy director Melinda Claybaugh denied the practice but later confirmed it when pressed specifically about public posts.
- This revelation may lead to further investigations and potentially stricter regulations on data use for AI training by tech giants.
AI development and competitive landscape: Meta’s use of its vast user data for AI training highlights the competitive advantage it holds in the AI race.
- CEO Mark Zuckerberg previously boasted that Meta had more user data than was used to train ChatGPT, positioning the company as a strong contender in AI development.
- The extensive dataset could potentially give Meta an edge in creating more advanced AI models, particularly in natural language processing and image recognition.
- However, this approach also raises concerns about the potential for biased or toxic AI outputs, given the unfiltered nature of social media content.
Broader implications for social media users: This revelation underscores the long-term consequences of sharing information on social media platforms.
- Users who have been active on Facebook or Instagram since 2007 may not have anticipated their content being used for AI training nearly two decades later.
- The incident highlights the importance of understanding platform privacy settings and the potential future uses of publicly shared content.
- It also raises questions about the ownership and control of user-generated content on social media platforms.
Looking ahead: Balancing innovation and privacy: Meta’s actions bring to the forefront the ongoing debate about balancing technological innovation with user privacy and data protection.
- As AI development accelerates, tech companies may face increasing pressure to be more transparent about their data collection and usage practices.
- Regulators worldwide may need to reassess existing data protection laws to address the specific challenges posed by AI training on user-generated content.
- Users may become more cautious about what they share publicly on social media platforms, potentially impacting the nature of content and engagement on these platforms in the future.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...