Rights Reservation

Guidance on practical tools and protocols to help publishers reserve rights and protect content from unauthorized AI training

Learn more about these 3 options

Robots.txt | TDMRep | ISCC

Robots.txt

  • A robots.txt file is a plain text document found within a website (example.com/robots.txt), instructing crawlers/bots as to which sections they can access and index from that website; 
  • It helps website owners to control the behaviour of crawlers and to manage crawling traffic; 
  • It only covers situations in which the content owner is also the website owner. If content is copied to a website not controlled by the content owner, that indication is lost. 
  • It is not a standard, rather a widely-used protocol; 
  • It is a binary mechanism, so it can only instruct a crawler to collect (=0) or not to collect content (=1); 
  • It provides mere indicators, it doesn’t consist in hard blocking, and a crawler can be programmed to not consult or ignore the instructions; 
  • It only indicates that content cannot be crawled and does not separate situations in which crawlers are used for multiple purposes, like fetching content for search indexing (which might be allowed) and for training of AI models (which might be disallowed). 
  • It is the responsibility of the web owner to list all the crawlers that they wish to allow or disallow from their website, thus it risks not being exhaustive and effective and it places a considerable burden on the web owner.
  • AIPREF: We are monitoring progress at IETF level to update robots.txt to express AI Preferences.

User-agent: discoverybot/2.0
Disallow: /
User-agent: YoudaoBot/1.0
Disallow: /
User-agent: Sogou web spider/3.0
Disallow: /
User-agent: *
Disallow: /connect/archive
Disallow: /about/press-releases/archive
Disallow: /_dynamic-products/ 

 

TDMrep

  • The TDM Reservation Protocol (TDMRep) allows rightsholders to declare their choice regarding text & data mining of web resources under their control, easing the discovery of TDM licensing policies associated with such content; 
  • The TDMRep protocol can be implemented both at website and content level (e.g., in pdf and EPUB), in different ways depending on the level of expertise of the implementer; 
  • The indications, expressed at different levels and depending on the technical expertise available, lead to a central policy file hosted on the website (e.g., https://publisher.com/policies/policy.json) that can be easily accessed and read by crawlers; 
  • It was developed to obviate any problems with findability, as TDMRep does not to interfere with traditional web crawling and search engine indexing performed by web crawlers whose access to site content is traditionally regulated by robots.txt. 
  • It allows recipients of the declaration to adjust their scraping behaviour, or to find information about the licensing opportunities offered by the rightsholder. 
TDM File on the Origin Server (same as robots.txt)
[{
“location”: “/”,
“tdm-reservation”: 1,
“tdm-policy”: “https://publisher.com/policies/policy.json”
}]
TDM Header Field in HTTP Responses
HTTP/1.1 200 OK
Date: Wed, 14 Jul 2021 12:07:48 GMT
Content-type: text/html
tdm-reservation: 1
tdm-policy: https://publisher.com/policies/policy.json
TDM metadata in HTML content
<head>
<meta charset=”utf-8″>
<meta name=”tdm-reservation” content=”1″>
<meta name=”tdm-policy” content=”https://publisher.com/policies/policy.json”>
TDM Metadata in PDF / EPUB publications with XMP / XML tags

ISCC

  • The International Standard Content Code (ISCC) enables to generate a content identification code from the digital content itself: any user, entity or system with access to the content can generate/derive the ISCC code from the digital media assets. This means that two users or machines can generate the same or a similar identifier directly from the media file without exchanging any kind of information or metadata about the content. 
  • It is a content-dependent identifier; 
  • It is an ISO standard (ISO 24138:2024) and applies to different media formats (images, videos, audios, text files); 
  • ISCC works in combination with a registry (e.g. Liccium is one of the possible registry providers), where rightsholders would need to register ISCC codes and tie them to the relative rights declarations; 
  • The ISCC serves multiple use cases beyond rights reservation; by binding ISCC codes to rights declarations associated with digital assets, this system can help in ensuring trust in ownership, attribution, and authenticity of digital media content. 

ISCC:KAC3D4DPLBES6KBFGJ4DPZ4H36UFKC6SMPGZ63APMVIX4EAEWCK74DY

Title: example.pdf

Creator name: publisher

License URL: https://publisher.com/license

Copyright notice: © 2025 publisher

TDM reservation = 1

The Latest AI News from STM

STM publishes new discussion document on responsible use of research content in generative AI

STM has published "Toward Responsible Use of Research Content in Generative AI," a discussion document putting forward considerations for the responsible use of research content in generative AI tools, and inviting the broader research and GenAI development community to engage. The document focuses on what makes research content and research communication distinct from other types of content and communication: it is quality-assured through peer review and anchored to a Version of Record, with scholarly publishers playing a central role in upholding integrity standards. When GenAI tools handle research content, these properties of scholarly communication become relevant — including the occurrence of corrections and retractions, accurate attribution, proper citation, and clear signals of verifiability. STM is actively seeking input from GenAI developers, researchers, institutions, funders, and publishers. Read the document and submit feedback:
LEARN MORE

Global reporting standard for AI disclosure in research: first consultation is open

Transparency about the use of generative Artificial Intelligence (AI) in research articles and other scholarly outputs is an important aspect of research integrity. At present, practices for  how  to disclose AI use vary widely across disciplines, regions, and publication cultures.  To address this issue, STM has released a report “Recommendations for a Classification of AI...
LEARN MORE

In the media | Times Higher Education — “Unseen efforts to catch paper mill outputs bear fruit”

In an article on growing threats to research integrity, Times Higher Education covers STM’s report Safeguarding Scholarly Communication: Publisher Practices to Uphold Research Integrity. The article describes how publishers are increasingly focused on identifying integrity issues before publication—responding to paper mills, AI-enabled fabrication, and coordinated fraud networks—while scaling up research integrity teams and collaborating on...
LEARN MORE