Rights Reservation

Guidance on practical tools and protocols to help publishers reserve rights and protect content from unauthorized AI training

Learn more about these 3 options

Robots.txt | TDMRep | ISCC

Robots.txt

  • A robots.txt file is a plain text document found within a website (example.com/robots.txt), instructing crawlers/bots as to which sections they can access and index from that website; 
  • It helps website owners to control the behaviour of crawlers and to manage crawling traffic; 
  • It only covers situations in which the content owner is also the website owner. If content is copied to a website not controlled by the content owner, that indication is lost. 
  • It is not a standard, rather a widely-used protocol; 
  • It is a binary mechanism, so it can only instruct a crawler to collect (=0) or not to collect content (=1); 
  • It provides mere indicators, it doesn’t consist in hard blocking, and a crawler can be programmed to not consult or ignore the instructions; 
  • It only indicates that content cannot be crawled and does not separate situations in which crawlers are used for multiple purposes, like fetching content for search indexing (which might be allowed) and for training of AI models (which might be disallowed). 
  • It is the responsibility of the web owner to list all the crawlers that they wish to allow or disallow from their website, thus it risks not being exhaustive and effective and it places a considerable burden on the web owner.
  • AIPREF: We are monitoring progress at IETF level to update robots.txt to express AI Preferences.

User-agent: discoverybot/2.0
Disallow: /
User-agent: YoudaoBot/1.0
Disallow: /
User-agent: Sogou web spider/3.0
Disallow: /
User-agent: *
Disallow: /connect/archive
Disallow: /about/press-releases/archive
Disallow: /_dynamic-products/ 

 

TDMrep

  • The TDM Reservation Protocol (TDMRep) allows rightsholders to declare their choice regarding text & data mining of web resources under their control, easing the discovery of TDM licensing policies associated with such content; 
  • The TDMRep protocol can be implemented both at website and content level (e.g., in pdf and EPUB), in different ways depending on the level of expertise of the implementer; 
  • The indications, expressed at different levels and depending on the technical expertise available, lead to a central policy file hosted on the website (e.g., https://publisher.com/policies/policy.json) that can be easily accessed and read by crawlers; 
  • It was developed to obviate any problems with findability, as TDMRep does not to interfere with traditional web crawling and search engine indexing performed by web crawlers whose access to site content is traditionally regulated by robots.txt. 
  • It allows recipients of the declaration to adjust their scraping behaviour, or to find information about the licensing opportunities offered by the rightsholder. 
TDM File on the Origin Server (same as robots.txt)
[{
“location”: “/”,
“tdm-reservation”: 1,
“tdm-policy”: “https://publisher.com/policies/policy.json”
}]
TDM Header Field in HTTP Responses
HTTP/1.1 200 OK
Date: Wed, 14 Jul 2021 12:07:48 GMT
Content-type: text/html
tdm-reservation: 1
tdm-policy: https://publisher.com/policies/policy.json
TDM metadata in HTML content
<head>
<meta charset=”utf-8″>
<meta name=”tdm-reservation” content=”1″>
<meta name=”tdm-policy” content=”https://publisher.com/policies/policy.json”>
TDM Metadata in PDF / EPUB publications with XMP / XML tags

ISCC

  • The International Standard Content Code (ISCC) enables to generate a content identification code from the digital content itself: any user, entity or system with access to the content can generate/derive the ISCC code from the digital media assets. This means that two users or machines can generate the same or a similar identifier directly from the media file without exchanging any kind of information or metadata about the content. 
  • It is a content-dependent identifier; 
  • It is an ISO standard (ISO 24138:2024) and applies to different media formats (images, videos, audios, text files); 
  • ISCC works in combination with a registry (e.g. Liccium is one of the possible registry providers), where rightsholders would need to register ISCC codes and tie them to the relative rights declarations; 
  • The ISCC serves multiple use cases beyond rights reservation; by binding ISCC codes to rights declarations associated with digital assets, this system can help in ensuring trust in ownership, attribution, and authenticity of digital media content. 

ISCC:KAC3D4DPLBES6KBFGJ4DPZ4H36UFKC6SMPGZ63APMVIX4EAEWCK74DY

Title: example.pdf

Creator name: publisher

License URL: https://publisher.com/license

Copyright notice: © 2025 publisher

TDM reservation = 1

The Latest AI News from STM

VIEW ALL NEWS

Just launched: the STM AI Portal

AI & Trusted Research   How AI shapes — and is shaped by — the academic record  From accelerating drug development to enabling green technologies, AI holds extraordinary promise for science — but it also introduces real risks, from AI-generated misinformation to large-scale manipulation of the academic record. As the pace of change accelerates, publishers and partners across the research ecosystem are already taking action: developing safeguards, exploring frameworks, and sharing best practices. What’s needed is a single, accessible space to bring this work together. That’s why STM has launched the AI Portal: a dynamic hub curating resources, commentary, and guidance to help our members and the broader community navigate the evolving intersection of AI and academic publishing. “Thanks to the continued engagement and support of our members, we’re now able to bring together, in ...
LEARN MORE

New Scholarly Kitchen post: Classifying AI Use in Manuscript Preparation – A Recommendation

It is almost a cliché to say that AI has changed the academic publishing industry – for authors, reviewers, editors, readers, and publishers themselves. In 2023, STM published guidelines outlining ethical and practical considerations regarding the use of AI tools in the publication process. In the two years since, technology has progressed significantly, creating even more possibilities...
LEARN MORE

STM Integrity Hub in action: a chronicle

Launched in 2022, the STM Integrity Hub has grown into a collaborative platform that helps publishers safeguard the scholarly record. Today, more than 35 publishers are using the Hub to screen over 125,000 papers each month, intercepting around 1,000 suspected paper-mill submissions per month.  By combining shared tools, community intelligence, and trusted third-party integrations, the...
LEARN MORE