As artificial intelligence (AI) continues to evolve at breakneck speed, so does the relationship between copyright laws and AI technology. The interplay of these forces has recently thrown Nvidia, the AI chipmaking behemoth, into a legal debate about shadow libraries and whether these collections should be permitted as part of AI training. Nvidia’s dilemma not only sets precedents for itself but also has important implications for the future of training data and copyright respect.
Those who want to know how to do something and the curious who enjoy browsing through copyrighted material can easily find what they want here: law books to learn about basic copyright and patents, medical encyclopaedias, stereo-phonic, 3D and audio books, a complete collection of BBC archive recordings, PDFs and scanned papers from scholarly conference proceedings, Harvard journals, most complete journal collections, dozens of university and business school curricula, all Encyclopædia Britannica volumes, introductory physics textbooks and hundreds of thousands of books on every subject imaginable. Owners of these shadow libraries are often named in civil lawsuits for alleged mass copy infringement of copyright laws, and they usually lose. Such sites have earned notoriety – take, for example, the platforms run by the operator behind Z-Library and the site Library Genesis (Libgen). The latter in particular has been accused of ‘mass infringement’ of copyright by the Alliance for Creativity and Entertainment, a coalition of major studios and networks, and by the International Publishers Association. The industry regularly calls for shutdowns, but curiously such ‘pirate libraries’ also have their ardent defenders – among them the proponents of free information.
(Ironically, one of the most vocal defenders of these shadow libraries is Nvidia, a company currently engaged in a lawsuit claiming the sources of its AI training data seized from these collections.) In this highly publicised lawsuit, Nvidia claims that it didn’t use content from these shadow libraries for training its NeMo AI platform or for any other purpose. However, it denies that any of these platforms qualify as shadow libraries in a legal sense. Why? Because, as it insists to the court, using these platforms is ‘not against the law’. What Nvidia legal is doing is challenging the mainstream narrative and highlighting the need to define copyright law in the digital age.
However, in their defence, Nvidia raised some central issues about AI training and fair use. In their view, the AI training itself is transformative. This is because convincing the courts that turning copyrighted material into data for training a machine-learning (AI) system represents a fair use of the material underpins the argument. This is at the centre of current controversies over whether and how AI developers utilise copyrighted materials, and what that means for authors and publishers.
In its response to the lawsuit, Nvidia says it ‘generally believes’ its use of the training data was lawful, and these disputes reflect a larger tension between developing technologies and archaic copyright laws. As AI development continues, the question of how to provide compensation to creators without silencing innovation becomes more complicated.
This legal showdown highlights one of the hurdles for AI firms in sourcing data to train their systems – while making data available to support training seems like a ‘no-brainer’, there is a legal question about what material can be ascribed copyright, and if it can be used without permission. OpenAI is now one of several companies licensing content to train AI, but it’s still unclear which side of the fence AI firms will be on when it comes to training data – whether a ‘sharing-is-caring’ attitude will prevail, or whether firms will err on the side of caution and take a more restrictive approach to data, regardless of the consequences. As far as Nvidia’s case goes, it’s one to watch; it might well be one of the first factors that shape future norms and regulations related to AI.
Further legal avenues aside, it seems that the language Nvidia has used to defend itself reflects something much deeper than the immediate legal questions at play. Shadow libraries, while they might have murky legal status, are very much a part of a struggle for democratised access to culture. Nvidia’s defence of them, then, plays into a wider debate about the freedom of information, and the role of technology companies in facilitating or denying access.
Even AI powerhouse Nvidia, whose work on AI chip technology sparked the entire boom, has become embroiled in copyright questions after September’s Historic House Album that both showcased and expanded the capabilities of contemporary AI. And while Nvidia’s most recent earnings report showed how much the AI boom is fuelling the firm’s bottom line, it also highlights the need to ensure that it can continue to train AI on ever-diverse datasets.
But it returns to its roots, literally, as a pioneer in AI, harnessing its technological power to try to innovate its way out of trouble. Nvidia’s role in the copyright battles reflects how difficult it is to pioneer in the age of digital abundance and contest.
Courts will likely continue to grapple with this balance between innovation and copyright respect as Nvidia and other companies in the AI field push forward in the coming years. The responses that courts give to Nvidia’s litigation may establish important precedents for the sourcing and use of training data, which can play a role in shaping the future of AI as well as the free flow of information. Regardless of the eventual verdict, Nvidia’s presence across these debates reflects its place as an industry leader in the digital era, navigating the overlapping worlds of technological progress and copyright integrity.
More Info:
© 2025 UC Technology Inc . All Rights Reserved.