Amnesty Flags Unlawful Data Scraping by Big Tech for AI Training

3 min readSources: JURIST

Amnesty International says major AI firms use unlawful web scraping to train models.

Why it matters: Legal professionals must grasp privacy and human rights risks from AI dataset practices, which could trigger regulatory action and lawsuits.

  • Amnesty International's May 2026 briefing exposes unlawful data scraping by OpenAI, Google, Meta, and others.
  • These companies collect billions of online posts and images without explicit consent for training AI.
  • The data scraping violates privacy rights by design and impacts marginalized communities and the environment.
  • Amnesty calls to ban AI systems built on unlawful data and demands companies halt non-consensual data collection.
  • Only Microsoft, Amazon, Intel, OpenAI, and Meta responded to Amnesty inquiries by publication.
  • Google and Microsoft data centers' AI operations significantly increased greenhouse gas emissions since 2019.

Amnesty International released a briefing on May 28, 2026, titled "Unlawful by Design: Exposing the Human Rights Costs of Generative AI". The report highlights that major generative AI companies such as OpenAI, Google, Meta, DeepSeek, Midjourney, and Stability AI rely on extensive unlawful web scraping to gather vast amounts of data.

This data comprises billions of public online posts and images scraped without users' explicit consent, violating privacy rights by design. According to Amnesty's research, such large-scale data collection not only infringes on privacy but also exacerbates environmental harm and disproportionately affects marginalized communities. The environmental concern is underscored by increased emissions from data centers supporting AI—Google reported a 48% rise in greenhouse gas emissions since 2019, and Microsoft noted a 29% increase from 2020 to 2024.

Likhita Banerji, Head of Amnesty's Algorithmic Accountability Lab, said, "Companies across the world are supplying generative AI products under the veneer of efficiency and sophistication, but in reality, these systems perpetuate mass invasions of privacy through unlawful web scraping." She stressed the need to challenge these "design choices adopted by companies who build generative AI systems relying on training data extracted non-consensually and on a grand scale."

Amnesty International urges a prohibition on standalone generative AI systems built using unlawfully scraped data and calls for an immediate halt to non-consensual data collection practices. The group reached out to the implicated companies for comment; only Microsoft, Amazon, Intel, OpenAI, and Meta responded by the time of publication.

This briefing signals emerging legal and regulatory risks for firms working with generative AI. Legal professionals should closely monitor these developments to anticipate potential privacy litigation and regulatory interventions concerning AI training data practices.

By the numbers:

  • 48% increase in Google's greenhouse gas emissions since 2019 — linked to data center and supply chain operations supporting AI
  • 29% increase in Microsoft's emissions from 2020 to 2024 — due to AI-supporting data center processes