Authors and publishers in Europe and elsewhere complacently thinking it won’t matter to them need to think again. Unless they want to stop their sales to the USA.
In a landmark ruling, U.S. District Judge William Alsup of San Francisco declared that Anthropic did not breach copyright law by training its chatbot Claude on millions of copyrighted books – even though the copies were originally obtained from illegal online sources.
I could hear the champagne corks popping from here in West Africa as AI companies savoured the news, just drowning out the grinding of teeth by the Luddite Fringe that has so willfully misunderstood what AI “training” involves.
The judge had no such problem with the concept: Alsup’s decision rested on the finding that the process was “quintessentially transformative”: by distilling thousands of written works into a system capable of composing new text, Anthropic wasn’t simply reproducing content but rather generating new, derivative output.
But all importantly, this ruling rightfully stopped short of endorsing the method of acquisition: the use of pirated copies remains a separate issue now headed for trial. Meta won’t have been celebrating as much as some others.
Fair Use and the Nature of AI Training
At its heart, the ruling addresses two distinct questions. First is whether the training activity qualifies as fair use – a determination grounded in the idea that transforming original works to produce something new is legally permissible. In Judge Alsup’s words, “Like any reader aspiring to be a writer, Anthropic’s trained upon works not to race ahead and replicate or supplant them – but to turn a hard corner and create something different.”
This notion of transformation is a critical pivot for AI developers. It suggests that if the training process does not generate a mere replication of the source material but instead produces novel outputs, the activity may fall squarely within fair use.
And while this case specifically referenced books, it’s hard to see how this case, subject to appeal, will not set a precedent for artists, narrators, translators et al.
But all importantly. such a defence is sensitive to how the source material is obtained. The court emphasised that even if the transformation argument is valid, using pirated content is not acceptable. Ill-gotten gains and all that…
Thus, the legal debate distils into two interconnected parts: the lawfulness of the training itself versus the legality of the data’s procurement. With the bottom line that, if predicated on legally procured content, AI training is simply a scaled-up version of what we humans do all the time.
Training, Transformation, and the Human Creative Process
The discussion on fair use in AI training invites deeper reflection on what constitutes creative learning – and whether AI’s “training” differs fundamentally from human processes. In an article I wrote almost two years ago to the day – before I began using AI for TNPS subtitles and research – I examined these questions.
My case was that humans learn and refine their creative abilities largely through emulation. In schools, teachers copy texts on boards and students transcribe them; in art classes, pupils routinely copy the works of their masters. As Stephen King notes in On Writing, a writer “must do two things above all others: read a lot and write a lot.” Similarly, young artists learn by copying – and later adapting – works by figures like Michelangelo, Giotto, or Verrocchio. Just as human creativity is an accumulation of readings, observations, and reinterpretations, so too is the training of AI models, albeit built on vast quantities of existing content.
Even when every sentence in a new text can be traced back to myriad sources, the act of reordering and recontextualising content is what grants the creative work its originality. The same law underpinning our acceptance of academic citation and emulation is now extended to AI, provided the process is transformative.
Competitive and Creative Displacement
Returning to the June 2025 judgment: “Authors contend generically that training LLMs will result in an explosion of works competing with their works. … This order assumes that is so. But Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act.”
The notion that authors have some God-given entitlement to a competition-free industry was never going to stand up in court, and has no place in the real world.
If this ruling takes away that mental crutch from the creative industries, then all well and good.
Back in 2024 the UK’s Society of Authors asserted, “almost four in ten translators (36%) said they had already lost work due to generative AI and nearly half said that AI had decreased their income.“
No evidence was provided, and no evidence was asked for from the survey respondents, leaving the door wide open for less-competent translators to blame AI for their woes, and for competent translators to blame AI regardless of any number of other factors that might be responsible. AI is the perfect scapegoat for every problem, if we want it to be.
But as Judge Alsup makes clear, “This is not the kind of competitive or creative displacement that concerns the Copyright Act.”
That still leaves the problem of “Who owns the copyright on a book written by artificial intelligence?”
That’s a legal case for another day There are no clear answers here, because of course there are no clear definitions. We are in uncharted legal waters.
Implications for the Future
But unless and until the latest ruling is overturned on appeal, we need to move on. Under USA law at least, AI training is fair use.
And authors and publishers in Europe and elsewhere complacently thinking it won’t matter to them need to think again. Unless they want to stop their sales to the USA. Just this month it was announced the UK’s exports to the USA were up 23%, while 63% of the UK’s book industry revenue came from overseas.
The USA is the biggest book market out there. It’s also the home of most of the big AI companies that are impacting the publishing industry.
Lawful Acquisition Versus Piracy
And that raises big concerns: Namely, why should American AI companies continue to offer book publishers anywhere a shit-load of cash for training rights when this ruling effectively says purchasing a single copy of every text would provide a safe harbour for AI companies.
In traditional copyright doctrine, buying a book grants you a right to read or teach from it – but not necessarily to reproduce it wholesale. The latest legal implies that were an AI company to acquire works by paying the cover price, it could use those books in training and defend the action as fair use because the content was legally obtained.
In other words, the “permission” argument many authors and publishers have put forward is now dead in the water. Authors and publishers do not have the downstream control over their works that they imagined they had.
From Permission to Compensation
This shifts the whole AI debate from permission to compensation, creating a situation where publishers and authors are now likely to be forced to accept royalties determined by market pressures rather than by lucrative negotiated licensing agreements.
It wasn’t so long ago that Microsoft was offering HarperCollins $5,000 per title for a three-year training concession on copyrighted content. HarperCollins in turn offered 50% to the authors.
Now, it simply doesn’t need to. Fair use has been ruled and if Microsoft, or whoever, obtain a book through legal channels, the compensation to the author can be as little as the standard royalty. Or nothing if the AI company sources pre-owned books.
My guess is that even now AI companies are busy asking their AIs the best way of buying up every title out there, for the cover price, saving themselves a fortune in training fees that many had previously undertaken.
But let me wind up this post with another key finding from the Court.
“Anthropic Purchased Millions of Print Copies”
We hear constantly from the Luddite Fringe, led by the UK’s Society of Authors and the Publishers Association, that AI companies are stealing our books.
“They must pay!” rants the SoA. “The great copyright heist!” rants Dan Conway of the PA.
Leaving aside most of what the AI is trained on was published in America, we now have a clear statement from a Court, based on real evidence, not soundbite value, that Anthropic paid for a substantial portion of its training assets.
“Anthropic purchased millions of print copies to build a research library. It destroyed each print copy while replacing it with a digital copy for use in its library (not for sharing nor sale outside the company) … Anthropic purchased its print copies fair and square.“
There are suggestions AI companies sought deals with second-hand book traders to get cheaper books – and that of course is totally legal and leaves the author and publisher without a penny of direct benefit.
But the point is clear: “Anthropic purchased millions of print copies.”
Bu don’t worry. The SoA and the PA will not even have read the ruling in full, so no need to fear seeing any apologies to the member authors and publishers that were deliberately misled. Soundbites trump all!
And that, sadly, has been the history of publishing industry relations with AI companies to date.
If this ruling enables the industry to move on and deal with AI companies on honest terms, so much the better.
Piracy Losses
Meanwhile, there’s still the trial regarding using pirated content to get the popcorn out for. And there, the AI companies involved are on shaky ground. The admission is on record.
But compensation is going to be a legal battle in its own right. How much did Publisher A or Author B actually lose from a book appearing on a piracy site? Without knowing how many copies of a given title were illegally downloaded, that is unquantifiable.
Most folks that choose to use pirate sites for content do so because the content is otherwise unavailable, or because they have no intention of paying in the first place. No sale lost, therefore.
With a December trial set, AI companies will now be using their AI skills to evaluate just how big or small the losses to the industry from piracy really are.
Publishers love to rant about how much piracy costs them, but most of those numbers are speculative soundbite BS at best.
If this case ends up in a definitive ruling on the impact of piracy sites, then this will all have been worthwhile.
This post first appeared in the TNPS LinkedIn newsletter.