AlgoSoc / Publication
January 31, 2025

Generative AI, copyright and the AI Act

Written by Joao Pedro Quintais and published in the Journal of Intellectual Property, Information Technology and E-Commerce Law this article explores the intersection of transparency requirements and copyright law within the framework of the Copyright DSM Directive and the AI Act.

This paper provides a critical analysis of the Artificial Intelligence (AI) Act’s implications for the European Union (EU) copyright acquis, aiming to clarify the complex relationship between AI regulation and copyright law while identifying areas of legal ambiguity and gaps that may influence future policymaking. The discussion begins with an overview of fundamental copyright concerns related to generative AI, focusing on issues that arise during the input, model, and output stages, and how these concerns intersect with the text and data mining (TDM) exceptions under the Copyright in the Digital Single Market Directive (CDSMD).

The paper then explores the AI Act’s structure and key definitions relevant to copyright law. The core analysis addresses the AI Act’s impact on copyright, including the role of TDM in AI model training, the copyright obligations imposed by the Act, requirements for respecting copyright law—particularly TDM opt-outs—and the extraterritorial implications of these provisions. It also examines transparency obligations, compliance mechanisms, and the enforcement framework. The paper further critiques the current regime’s inadequacies, particularly concerning the fair remuneration of creators, and evaluates potential improvements such as collective licensing and bargaining. It also assesses legislative reform proposals, such as statutory licensing and AI output levies, and concludes with reflections on future directions for integrating AI governance with copyright protection.

The article concludes that generative AI models and their reliance on text and data mining techniques present unique innovation opportunities as well as significant challenges, particularly with respect to copyright compliance. It emphasizes the impossibility of itemizing all copyrighted material used in training data sets, due to the low originality threshold, territorial copyright fragmentation, and poor rights metadata. Instead, a comprehensive summary of training data, balancing the inclusion of both copyrighted and non-copyrighted content while protecting trade secrets and confidentiality, is proposed. This summary should provide insights into data collections, sources, and the period of use to facilitate compliance with the TDM exception, lawful access requirements, and respect for opt-outs.

Find the full paper Generative AI, copyright and the AI Act here.