Rights Reservation

Guidance on practical tools and protocols to help publishers reserve rights and protect content from unauthorized AI training

Learn more about these 3 options

Robots.txt | TDMRep | ISCC

Robots.txt

  • A robots.txt file is a plain text document found within a website (example.com/robots.txt), instructing crawlers/bots as to which sections they can access and index from that website; 
  • It helps website owners to control the behaviour of crawlers and to manage crawling traffic; 
  • It only covers situations in which the content owner is also the website owner. If content is copied to a website not controlled by the content owner, that indication is lost. 
  • It is not a standard, rather a widely-used protocol; 
  • It is a binary mechanism, so it can only instruct a crawler to collect (=0) or not to collect content (=1); 
  • It provides mere indicators, it doesn’t consist in hard blocking, and a crawler can be programmed to not consult or ignore the instructions; 
  • It only indicates that content cannot be crawled and does not separate situations in which crawlers are used for multiple purposes, like fetching content for search indexing (which might be allowed) and for training of AI models (which might be disallowed). 
  • It is the responsibility of the web owner to list all the crawlers that they wish to allow or disallow from their website, thus it risks not being exhaustive and effective and it places a considerable burden on the web owner.
  • AIPREF: We are monitoring progress at IETF level to update robots.txt to express AI Preferences.

User-agent: discoverybot/2.0
Disallow: /
User-agent: YoudaoBot/1.0
Disallow: /
User-agent: Sogou web spider/3.0
Disallow: /
User-agent: *
Disallow: /connect/archive
Disallow: /about/press-releases/archive
Disallow: /_dynamic-products/ 

 

TDMrep

  • The TDM Reservation Protocol (TDMRep) allows rightsholders to declare their choice regarding text & data mining of web resources under their control, easing the discovery of TDM licensing policies associated with such content; 
  • The TDMRep protocol can be implemented both at website and content level (e.g., in pdf and EPUB), in different ways depending on the level of expertise of the implementer; 
  • The indications, expressed at different levels and depending on the technical expertise available, lead to a central policy file hosted on the website (e.g., https://publisher.com/policies/policy.json) that can be easily accessed and read by crawlers; 
  • It was developed to obviate any problems with findability, as TDMRep does not to interfere with traditional web crawling and search engine indexing performed by web crawlers whose access to site content is traditionally regulated by robots.txt. 
  • It allows recipients of the declaration to adjust their scraping behaviour, or to find information about the licensing opportunities offered by the rightsholder. 
TDM File on the Origin Server (same as robots.txt)
[{
“location”: “/”,
“tdm-reservation”: 1,
“tdm-policy”: “https://publisher.com/policies/policy.json”
}]
TDM Header Field in HTTP Responses
HTTP/1.1 200 OK
Date: Wed, 14 Jul 2021 12:07:48 GMT
Content-type: text/html
tdm-reservation: 1
tdm-policy: https://publisher.com/policies/policy.json
TDM metadata in HTML content
<head>
<meta charset=”utf-8″>
<meta name=”tdm-reservation” content=”1″>
<meta name=”tdm-policy” content=”https://publisher.com/policies/policy.json”>
TDM Metadata in PDF / EPUB publications with XMP / XML tags

ISCC

  • The International Standard Content Code (ISCC) enables to generate a content identification code from the digital content itself: any user, entity or system with access to the content can generate/derive the ISCC code from the digital media assets. This means that two users or machines can generate the same or a similar identifier directly from the media file without exchanging any kind of information or metadata about the content. 
  • It is a content-dependent identifier; 
  • It is an ISO standard (ISO 24138:2024) and applies to different media formats (images, videos, audios, text files); 
  • ISCC works in combination with a registry (e.g. Liccium is one of the possible registry providers), where rightsholders would need to register ISCC codes and tie them to the relative rights declarations; 
  • The ISCC serves multiple use cases beyond rights reservation; by binding ISCC codes to rights declarations associated with digital assets, this system can help in ensuring trust in ownership, attribution, and authenticity of digital media content. 

ISCC:KAC3D4DPLBES6KBFGJ4DPZ4H36UFKC6SMPGZ63APMVIX4EAEWCK74DY

Title: example.pdf

Creator name: publisher

License URL: https://publisher.com/license

Copyright notice: © 2025 publisher

TDM reservation = 1

The Latest AI News from STM

In the Media | Times Higher Education — “Unseen efforts to catch paper mill outputs bear fruit”

In an article on growing threats to research integrity, Times Higher Education covers STM’s report Safeguarding Scholarly Communication: Publisher Practices to Uphold Research Integrity. The article describes how publishers are increasingly focused on identifying integrity issues before publication—responding to paper mills, AI-enabled fabrication, and coordinated fraud networks—while scaling up research integrity teams and collaborating on shared screening approaches. THE highlights the STM Integrity Hub as an example of collaboration designed to improve detection capacity across publishers, while noting the continued opportunity to extend benefits to smaller publishers and those operating outside major publishing centers. Read the full article at Times Higher Education → (subscription required) Read STM’s report→ available ...
LEARN MORE

STM supports Copyright Alliance brief in key U.S. copyright case

STM has endorsed an amicus curiae brief filed by the Copyright Alliance in the ongoing U.S. appeals case Thomson Reuters v. ROSS Intelligence. The case raises important questions about copyright protection for editorial content — including material similar in nature and function to content produced by STM’s members. The case also presents a set of facts under which the lower court rightly found ROSS’s...
LEARN MORE

A recap: STM Integrity & Innovation Days 2025

On 9–10 December 2025, STM’s annual Innovation & Integrity Days brought together publishers, startups, funders, researchers and infrastructure providers for two days of focused, cross-sector collaboration in London.  Now in its third year (building on the legacy of STM Week), this year’s Innovation & Integrity Days reflected a noticeable shift: more dialogue across traditional boundaries, more...
LEARN MORE