Privacy Rules Tighten on AI Training Data, Raising Legal Stakes

3 min readSources: National Law Review

On June 25, 2025, CNIL clarified GDPR allows some AI training with public personal data under 'legitimate interest.'

Why it matters: Legal teams must track evolving privacy standards impacting AI training data to prevent enforcement actions and reputational harm. New regulations and research signal stricter scrutiny globally.

  • French regulator CNIL confirmed June 25, 2025, that GDPR 'legitimate interest' can support AI training using some public personal data.
  • A 2025 arXiv study by researchers at Harvard and MIT found re-identification risks in anonymized data, challenging compliance assumptions.
  • EU AI Act, effective August 2, 2026, imposes strict risk management, documentation, and reporting for high-risk AI systems.
  • US FTC’s 2026 enforcement priorities emphasize privacy protections and plan stronger checks on AI training data legality, as in FTC press releases.

Global privacy regulations are sharpening focus on the legal risks of AI training data. On June 25, 2025, France’s CNIL clarified that the GDPR's 'legitimate interest' legal basis can justify using personal data from public sources to train AI, but only with proper privacy safeguards. This guides organizations navigating GDPR compliance for AI development.

Further complicating compliance, a June 2025 study authored by Harvard and MIT researchers revealed that anonymized datasets often remain vulnerable to re-identification attacks. The finding warns that relying solely on data anonymization may not meet privacy standards for AI training data.

The EU AI Act, set to take effect August 2, 2026, enforces comprehensive duties on AI providers. It demands rigorous risk management, incident reporting, and thorough documentation, especially for AI classified as high-risk under the law.

Parallel moves are seen in the U.S., where the Federal Trade Commission highlighted privacy protection as core to its 2026 AI governance efforts. The FTC’s official statements outline plans for robust enforcement targeting illegal use of training data and algorithmic bias.

These shifts mean corporate legal teams must rigorously assess the legality of AI training datasets. Inaccurate or unauthorized data use amplifies risks of penalties and damage to reputation. Maintaining compliance requires clear policies on data sourcing and ongoing monitoring of regulatory changes.

By the numbers:

  • June 25, 2025 — CNIL guidance on GDPR and AI training data
  • August 2, 2026 — EU AI Act effective date
  • 2026 — US FTC prioritizes AI privacy enforcement

Yes, but: Some experts argue the GDPR’s 'legitimate interest' basis is context-dependent and may require case-by-case legal analysis, limiting blanket application.

What's next: Track further guidance expected from the EU that will clarify operational rules for AI compliance and new FTC enforcement actions in 2026.