Rights Reservation

Recommendations on current mechanisms to reserve rights

In the European Union, Article 4 of Directive (EU) 2019/790 on copyright and related rights in the Digital Single Market requires publishers to take active steps to reserve their rights from the text and data mining (TDM) exception for commercial uses. As rightsholders, publishers have the discretion to decide how best to reserve their rights in accordance with the law.

To support its members, STM has developed recommendations for consistent and effective implementation of Article 4. In particular, we supported the development of the TDMRep protocol, a simple solution for signaling content rights through metadata both at platform and work-level across content types.

The methods we point to—robots.txt, TDMRep, and the International Standard Content Code (ISCC)—provide complementary ways for rightsholders to communicate their TDM policies. While none can fully prevent misuse by non-compliant actors, they offer vital transparency signals that strengthen accountability and legal enforceability across the digital ecosystem.

We are constantly monitoring and engaging with stakeholders to consider the emergence of new solutions as well.

However, the lack of transparency and cooperation from some AI companies makes it difficult to verify whether these mechanisms are being respected—the successful implementation of rights reservation is dependent on granular transparency, authorised crawling and penalties for non-compliance.

Quick links

Robots.txt

TDMRep

ISCC

Rights Reservation Methods | A Comparison

The approaches below are not mutually exclusive and rather complementary solutions. As this field is fast-evolving, this list does not claim to be exhaustive. We are only including solutions developed by independent parties rather than proprietary tools.

The site owner has no technical way to “block” rogue scrapers or AI-training bots programmed to extract content without following the instructions provided, except using firewall rules, CAPTCHAs, or requiring authentication.
STM does not require its membership to implement the mechanisms below and rather aims at providing information so that members can assess and make informed decisions on their approach

FEATURES	ROBOTS.TXT	TDMREP	ISCC
Expressed at website level	✅	✅
Expressed at item level	✅	✅	✅
Is a standard			✅
Includes licensing information		✅	✅
Permission expressed in binary value 0 = no and 1 = yes	✅	✅	✅
Compatible with different media formats		✅	✅
Medium/high expertise needed to implement			✅
Additional infrastructure needs			✅
Risk of metadata stripping	✅	✅

Learn more about these 3 options

Robots.txt | TDMRep | ISCC

Robots.txt

A robots.txt file is a plain text document found within a website (example.com/robots.txt), instructing crawlers/bots as to which sections they can access and index from that website;
It helps website owners to control the behaviour of crawlers and to manage crawling traffic;

It only covers situations in which the content owner is also the website owner. If content is copied to a website not controlled by the content owner, that indication is lost.

It is not a standard, rather a widely-used protocol;

It is a binary mechanism, so it can only instruct a crawler to collect (=0) or not to collect content (=1);

It provides mere indicators, it doesn’t consist in hard blocking, and a crawler can be programmed to not consult or ignore the instructions;

It only indicates that content cannot be crawled and does not separate situations in which crawlers are used for multiple purposes, like fetching content for search indexing (which might be allowed) and for training of AI models (which might be disallowed).

It is the responsibility of the web owner to list all the crawlers that they wish to allow or disallow from their website, thus it risks not being exhaustive and effective and it places a considerable burden on the web owner.
AIPREF: We are monitoring progress at IETF level to update robots.txt to express AI Preferences.

User-agent: discoverybot/2.0
Disallow: /
User-agent: YoudaoBot/1.0
Disallow: /
User-agent: Sogou web spider/3.0
Disallow: /
User-agent: *
Disallow: /connect/archive
Disallow: /about/press-releases/archive
Disallow: /_dynamic-products/

TDMrep

The TDM Reservation Protocol (TDMRep) allows rightsholders to declare their choice regarding text & data mining of web resources under their control, easing the discovery of TDM licensing policies associated with such content;

The TDMRep protocol can be implemented both at website and content level (e.g., in pdf and EPUB), in different ways depending on the level of expertise of the implementer;

The indications, expressed at different levels and depending on the technical expertise available, lead to a central policy file hosted on the website (e.g., https://publisher.com/policies/policy.json) that can be easily accessed and read by crawlers;

It was developed to obviate any problems with findability, as TDMRep does not to interfere with traditional web crawling and search engine indexing performed by web crawlers whose access to site content is traditionally regulated by robots.txt.

It allows recipients of the declaration to adjust their scraping behaviour, or to find information about the licensing opportunities offered by the rightsholder.

TDM File on the Origin Server (same as robots.txt)	[{ “location”: “/”, “tdm-reservation”: 1, “tdm-policy”: “https://publisher.com/policies/policy.json” }]
TDM Header Field in HTTP Responses	HTTP/1.1 200 OK Date: Wed, 14 Jul 2021 12:07:48 GMT Content-type: text/html tdm-reservation: 1 tdm-policy: https://publisher.com/policies/policy.json
TDM metadata in HTML content	<head> <meta charset=”utf-8″> <meta name=”tdm-reservation” content=”1″> <meta name=”tdm-policy” content=”https://publisher.com/policies/policy.json”>
TDM Metadata in PDF / EPUB publications with XMP / XML tags

ISCC

The International Standard Content Code (ISCC) enables to generate a content identification code from the digital content itself: any user, entity or system with access to the content can generate/derive the ISCC code from the digital media assets. This means that two users or machines can generate the same or a similar identifier directly from the media file without exchanging any kind of information or metadata about the content.

It is a content-dependent identifier;

It is an ISO standard (ISO 24138:2024) and applies to different media formats (images, videos, audios, text files);

ISCC works in combination with a registry (e.g. Liccium is one of the possible registry providers), where rightsholders would need to register ISCC codes and tie them to the relative rights declarations;

The ISCC serves multiple use cases beyond rights reservation; by binding ISCC codes to rights declarations associated with digital assets, this system can help in ensuring trust in ownership, attribution, and authenticity of digital media content.

ISCC:KAC3D4DPLBES6KBFGJ4DPZ4H36UFKC6SMPGZ63APMVIX4EAEWCK74DY

Title: example.pdf

Creator name: publisher

License URL: https://publisher.com/license

TDM reservation = 1

The Latest AI News from STM

Book, news, and journal publishers join with authors in amicus brief in support of music publishers in Concord v. Anthropic

On March 30, 2026, the Association of American Publishers (AAP), News/Media Alliance (N/MA), International Association of Scientific, Technical & Medical Publishers (STM), and Authors Guild (AG) filed an amicus brief supporting the plaintiffs in Concord Music Group v. Anthropic. This case was brought in October 2023 by several music publishers alleging that Anthropic unlawfully used copyrighted musical works, particularly a large corpus of song lyrics, for training the AI product Claude. The case is before the United States District Court for the Northern District of California. The joint amicus brief explains that copyright law does not permit Anthropic, a multibillion-dollar company, to systematically copy human-authored works without permission, let alone to enrich itself by generating content that displaces the works it has taken. On fair use factor one, the amici highlight the latest academic research showing that large language models like Claude memorize works used in training in ...

LEARN MORE

STM publishes new discussion document on responsible use of research content in generative AI

STM has published “Toward Responsible Use of Research Content in Generative AI,” a discussion document putting forward considerations for the responsible use of research content in generative AI tools, and inviting the broader research and GenAI development community to engage. The document focuses on what makes research content and research communication distinct from other types...

LEARN MORE

Global reporting standard for AI disclosure in research: first consultation is open

Transparency about the use of generative Artificial Intelligence (AI) in research articles and other scholarly outputs is an important aspect of research integrity. At present, practices for  how  to disclose AI use vary widely across disciplines, regions, and publication cultures. To address this issue, STM has released a report “Recommendations for a Classification of AI...

LEARN MORE

VIEW ALL NEWS

Guidance on practical tools and protocols to help publishers reserve rights and protect content from unauthorized AI training

Recommendations on current mechanisms to reserve rights

Quick links

Robots.txt

TDMRep

ISCC

Rights Reservation Methods | A Comparison

FEATURES

ROBOTS.TXT

TDMREP

ISCC

Learn more about these 3 options

Robots.txt | TDMRep | ISCC

Robots.txt

TDMrep

ISCC

The Latest AI News from STM

Book, news, and journal publishers join with authors in amicus brief in support of music publishers in Concord v. Anthropic

STM publishes new discussion document on responsible use of research content in generative AI

Global reporting standard for AI disclosure in research: first consultation is open