×
Meta Mines 16 Years of Social Data for AI Training
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Meta’s extensive data mining for AI training revealed: Meta has admitted to scraping all public posts from Facebook and Instagram since 2007 to train its generative AI model, raising significant privacy concerns and contrasting sharply with Apple’s approach to AI development.

Scale of data collection: Meta’s admission covers a vast trove of user-generated content spanning nearly two decades, encompassing billions of posts, photos, and comments from its social media platforms.

  • The data collection includes all public posts made on Facebook and Instagram since 2007, excluding only content from users under 18 and private accounts.
  • Photos of children posted by parents on public accounts were also included in the data scraping.
  • This revelation contradicts the company’s earlier statements, which presented plans to use user data for AI training as a new initiative.

Privacy implications and user consent: The extensive data mining raises serious questions about user privacy and the extent of consent given for such use of personal information.

  • Meta did not offer opt-out options for users in most countries, including the United States and Australia.
  • In the European Union and United Kingdom, where legal requirements necessitated opt-out options, the legality of making it opt-out rather than opt-in is under investigation.
  • The company’s approach contrasts sharply with more privacy-focused AI development strategies, such as those employed by Apple.

Regulatory scrutiny and company response: Meta’s practices have come under increased scrutiny from regulators and lawmakers worldwide.

  • The admission was made during a public inquiry in Australia, led by Labor senator Tony Sheldon and Greens senator David Shoebridge.
  • Initially, Meta’s global privacy director Melinda Claybaugh denied the practice but later confirmed it when pressed specifically about public posts.
  • This revelation may lead to further investigations and potentially stricter regulations on data use for AI training by tech giants.

AI development and competitive landscape: Meta’s use of its vast user data for AI training highlights the competitive advantage it holds in the AI race.

  • CEO Mark Zuckerberg previously boasted that Meta had more user data than was used to train ChatGPT, positioning the company as a strong contender in AI development.
  • The extensive dataset could potentially give Meta an edge in creating more advanced AI models, particularly in natural language processing and image recognition.
  • However, this approach also raises concerns about the potential for biased or toxic AI outputs, given the unfiltered nature of social media content.

Broader implications for social media users: This revelation underscores the long-term consequences of sharing information on social media platforms.

  • Users who have been active on Facebook or Instagram since 2007 may not have anticipated their content being used for AI training nearly two decades later.
  • The incident highlights the importance of understanding platform privacy settings and the potential future uses of publicly shared content.
  • It also raises questions about the ownership and control of user-generated content on social media platforms.

Looking ahead: Balancing innovation and privacy: Meta’s actions bring to the forefront the ongoing debate about balancing technological innovation with user privacy and data protection.

  • As AI development accelerates, tech companies may face increasing pressure to be more transparent about their data collection and usage practices.
  • Regulators worldwide may need to reassess existing data protection laws to address the specific challenges posed by AI training on user-generated content.
  • Users may become more cautious about what they share publicly on social media platforms, potentially impacting the nature of content and engagement on these platforms in the future.
Meta scraped all public Facebook and Instagram posts since 2007

Recent News

Nvidia’s new AI agents can search and summarize huge quantities of visual data

NVIDIA's new AI Blueprint combines computer vision and generative AI to enable efficient analysis of video and image content, with potential applications across industries and smart city initiatives.

How Boulder schools balance AI innovation with student data protection

Colorado school districts embrace AI in classrooms, focusing on ethical use and data privacy while preparing students for a tech-driven future.

Microsoft Copilot Vision nears launch — here’s what we know right now

Microsoft's new AI feature can analyze on-screen content, offering contextual assistance without the need for additional searches or explanations.