Court to Hear Nvidia AI Copyright Case Over Alleged Pirated Book Training
A U.S. court will hear Nvidia’s bid to dismiss claims it trained AI using pirated books.
Why it matters: The outcome could clarify or transform legal risks around AI training data, directly impacting how tech companies source datasets and how rights-holders enforce copyrights as AI models proliferate.
- Authors sued Nvidia in March 2024 over alleged use of copyrighted books from shadow libraries for AI training.
- Court documents show Nvidia sought high-speed access to 500 terabytes of book data from Anna’s Archive.
- Nvidia filed a motion to dismiss the lawsuit, set for hearing on April 2, 2026, in federal court.
- Recent related cases: Anthropic settled copyright claims for $1.5B after a judge initially found AI training on books could be fair use.
A high-stakes legal clash over AI training data sourcing moves to the U.S. District Court for the Northern District of California, with a hearing scheduled for April 2, 2026, on Nvidia's motion to dismiss a copyright lawsuit.
- Authors allege Nvidia trained its large language models on copyrighted books sourced from pirated platforms such as Anna’s Archive and Books3.
- Internal communications from Nvidia’s data strategy team revealed that the company discussed including Anna’s Archive in training data and weighed the risks, aiming for “high-speed access” to 500 terabytes of material.
- Nvidia argues the lawsuit should be dismissed, claiming plaintiffs have not identified specific works or explained their use in AI training.
This litigation follows notable precedents: In June 2025, a federal judge ruled it was legal for Anthropic to train AI on published books, citing fair use, though Anthropic soon after agreed to a $1.5 billion settlement over allegedly using pirated works.
The legal landscape remains unsettled, as in May 2026, major publishers and author Scott Turow filed another copyright lawsuit against Meta over its AI model training sets.
- "We are figuring out internally whether we are willing to accept the risk of using this data," an Nvidia team member wrote in internal communications.
Industry stakeholders and legal teams are closely watching Nvidia’s challenge, which could clarify—or upend—how copyright applies to datasets powering generative AI.
By the numbers:
- 500 terabytes — Data Nvidia sought from Anna’s Archive for AI training.
- $1.5 billion — Anthropic’s September 2025 settlement with authors over copyright claims.
- April 2, 2026 — Date of Nvidia’s motion to dismiss hearing.
Yes, but: Despite a recent fair use finding for AI training data, massive settlements show unresolved industry risks.
What's next: Nvidia’s motion to dismiss will be heard on April 2, 2026, potentially setting a new precedent for AI data sourcing.