Toward Responsible Use of Research Content in Generative AI

Generative Artificial Intelligence (GenAI) tools present transformative opportunities for the scholarly community and the public to discover, consume, and engage with scholarly information. They have the potential to accelerate research by, for example, summarizing earlier findings, connecting different research fields, or suggesting new research directions.

To realize this potential responsibly, STM emphasizes the importance of the needs of both content providers and the research community as buyers and end-users of GenAI tools.

While pursuing the significant new opportunities that GenAI offers for tool creators and tool users, it is essential that their development and deployment rely on trustworthy content in a way that respects the intellectual property rights of rightsholders. The responsible creation and use of GenAI tools must begin with respect for appropriate copyright laws.

Beyond those baseline requirements, improving scholarly discovery through GenAI tools requires a deep understanding of the characteristics that set scholarly communication apart from other forms of (online) communication. For instance, the responsible creation and use of GenAI tools should recognize the processes of peer-review, the primacy of the Version of Record (the final published version), mechanisms for corrections and retractions, and the need for attribution and citation.

In this brief, we put forward considerations for the responsible use of research content in and by GenAI tools, building on the experience and expertise of scholarly publishers. We present these considerations to developers and providers of such GenAI tools–whether they also publish research content or are focused exclusively on technology–as well as to the research community that uses and procures such tools, including researchers, institutes, and funders.

Recognizing that GenAI tools consist of multiple processing layers, most notably a training layer, an inference layer, and a presentation layer, we acknowledge that the considerations we present will be applicable in different ways to those layers, while many of them might be more directly relevant for the inference and presentation layers.

We invite all stakeholders (including developers, researchers, institutes, and funders) that are building or using GenAI tools with research content to engage with scholarly publishers to further refine these considerations, and we welcome responses and shared exploration. Together, we can take important steps toward GenAI aligning with the values and standards that underpin trusted research and scholarly communication.

The International Association of Scientific, Technical & Medical Publishers (STM)

The International Association of Scientific, Technical and Medical Publishers (STM) is a nonprofit foundation dedicated to advancing trusted research worldwide. STM consists of over 160 members in more than 21 countries who collectively publish 66% of all journal articles and tens of thousands of monographs and reference works. As academic and professional publishers, learned societies, university presses, start-ups and established players, STM members work together to serve society by developing standards and technology to ensure research is of high quality, trustworthy, and easy to access. For more information, visit stm-assoc.org.

Generative Artificial Intelligence (GenAI) tools provide significant value to the research community, and they have the potential to provide further, potentially transformative, benefits to the scholarly community as well as to patients, professionals, and the public.

Scholarly communication relies on the integrity, verifiability, attribution, reliability, and accuracy of the information and tools used to generate trustworthy insights. If GenAI tools are to become part of this process, it is imperative that they respect and uphold the core values of research content and scholarly communication.

Understanding the evolving nature of scientific discovery and its expression through scholarly communication is critical. A central process in establishing quality and authority of a scholarly work is peer-review—where independent experts evaluate submitted manuscripts, and help improve them, before they are published as what is called the Version of Record (VoR).

After publication, scholarly communication relies on vital, formal mechanisms such as retractions, errata, and corrigenda to maintain the integrity and currency of the VoR as the building block for the scholarly record. Prior to this formal process, in some fields it is customary for researchers to release a so-called preprint to solicit feedback early on; while that offers a shorter path to communicate research, this lacks the quality control of peer-review and has a more preliminary status.

Attribution and citation are other examples and practices that the scholarly community developed to bolster quality and integrity of the scholarly record. They formally connect scholarly works with each other and can be found throughout the scholarly record. Through them, intellectual contributions are recognized, provenance is established, and independent verification of claims is enabled.

Scholarly publishers have deep expertise in how research communities engage with scholarly material and on the differences between research communities working in different domains and subject areas. With decades of experience in advancing trusted research through thousands of journals and digital products used by millions of researchers worldwide, they have long upheld the values and principles inherent to scholarly communication.

GenAI tool developers working with research content and scholarly publishers publishing such content share a common interest with the scholarly community itself in maximizing the value of the scholarly information delivered, which can only be achieved by recognizing and respecting the unique characteristics of scholarly communication as sketched above. We believe that this shared interest—together with deep, mutually complementary knowledge and expertise—provides a strong motivation to collaborate.

When this occurs in a credible and transparent way, the trustworthiness and, therefore, value of GenAI tools for scholarly use will increase significantly.

The International Association of Scientific, Technical & Medical Publishers (STM) invites all stakeholders involved in applying and using GenAI with research content, whether GenAI tool builders or GenAI tool users, to discuss and collaborate on a shared vision to responsibly license, process, and present research content and serve the global research community.

This document serves as a starting point for such discussion. It suggests considerations for the responsible use of research content in and by GenAI tools and outlines key areas underpinning the trustworthiness of scholarly communication. It also indicates topics where collaboration would be beneficial for maintaining and furthering that trustworthiness. We welcome responses and shared exploration.

Considerations for Responsible Use of Research Content in and by GenAI Tools

To further fulfill the potential of GenAI technologies as powerful scholarly tools and demonstrate the responsible use of research content, we suggest considering the following desiderata when developing and using GenAI tools with research content:

• Comprehensive coverage of relevant, trustworthy, and reliable scholarly literature, or transparency where limitations exist;

• Differentiation between peer-reviewed and non-peer-reviewed content in indexing, processing, and presentation;

• Inclusion of rebuttals, corrections, and retractions in content processing and display;

• Prioritization of the original publication (Version of Record) in references, citations, inference, and retrieval;

• Verifiability through proper attribution and citation;

• User privacy protection through transparent prompt management or other means;

• Bias mitigation and promotion of fairness and impartiality in content selection and processing;

• Transparency for end-users and content authors, publishers, and owners regarding all the above.

These considerations build on, and do not replace, lawful access, licensing, and compliance with copyright law. All use of research content in and by GenAI tools must comply with applicable copyright and licensing requirements. Responsible use is additionally subject to AI‐specific regulations beyond copyright law.

We propose that implementing the above considerations increases the trustworthiness and value of GenAI tools for scholarly users, thereby also creating significant opportunities for the providers of these tools.

For those opportunities we recognize that GenAI tools consist of multiple processing layers, most notably a training layer (content selection, permissions, metadata), an inference layer (use of quality indicators and safeguards), and a presentation layer (clear disclosure, attribution, and links to sources). We acknowledge that the considerations we present will be applicable in different ways to those layers, while many of them might be more directly relevant for the inference and presentation layers.

Not addressing these considerations introduces risks, including the spread of misinformation, opportunities for deliberate disinformation, erosion of trust in scholarly tools and processes, and potential harm in fields such as medicine, with implications for public health, patient care, etc. It may also undermine the motivation of researchers, hinder the reproducibility and validation of scholarly work, and thus impair the advancement of science.

Underpinning the trustworthiness of scholarly communication, the following key areas can be identified in the use of GenAI tools with research content: content coverage and display, attribution and citation, transparency and accountability, and reliability and quality.

The discussion of each of these key areas follows a four-part structure:

  1. Why it matters for trusted research and scholarly users, including institutes and funders.

  2. How scholarly publishing handles it.

  3. Risks of not addressing it.

  4. Discussion topics.

Content coverage and display

Why it matters: Accurate, up-to-date, and validated (peer-reviewed) content is essential to ensure responses are current, complete, safe, and trustworthy.

Scholarly norm: Version of Record (VoR) is the gold standard to use, with other content visibly differentiated. Retractions, corrections, and metadata are rigorously processed. Content is up-to-date and discovery services strive to be complete in respect to scope.

Risks: Outdated or non-validated content may result in incorrect responses and mislead researchers, harm patients, cause misinformation, and spread disinformation.

Discussion topics:

• How can (responses based on) VoRs be prioritized and visibly differentiated?
• How can retractions and corrections be handled appropriately?
• How can content coverage (and coverage limitations) be best communicated?
• How can scholarly publishers and GenAI providers work together to further the goal of trustworthiness of scholarly communication?
• How can scholarly publishers help identify fake or fabricated research content?

Attribution and Citation

Why it matters: Attribution and citation acknowledge intellectual contributions, enable claim verification and provenance, foster constructive dialogue, prevent plagiarism, and provide context.

Scholarly norm: Attribution and citation are always present. They point to the first (or original) and verified source, using persistent identifiers (e.g. DOIs), with a consistent citation style. There is an indication of source type (such as “peer-reviewed” or other).

Risks: Absent or incorrect attribution or citation prevents verifiability, erodes trust, can carry legal consequences, undermines the motivation of researchers, and leads to the spread of mis- and disinformation.

Discussion topics:

• How to determine and display sufficient attributions and citations such that a response is verifiable and explainable?
• How to identify the first and original source?
• How to distinguish peer-reviewed sources from other source types?
• How to identify attributions, citations, and their links or identifiers?
• How can scholarly publishers assist in finding the right attribution or citation?

Transparency and Accountability

Why it matters: Scholarly research thrives on and requires openness of processes, reasoning, limitations, and debates, to ensure accountability and trust and enable proper re-use.

Scholarly norm: Version control, methodology documentation, tracking correction logs, availability of open research data, open communication, and acknowledgement of limitations contribute to transparency.

Risks: Lack of transparency can cause misinterpretation, prevent reproducibility and verifiability, and undermine trust. It can signal a lack of accountability.

Discussion topics:

• How can sufficient transparency be provided on training data, model limitations, testing procedures, software versioning and their differences, and user data handling – in a manner consistent with competition law?
• How can accountability be shown in an auditable way, demonstrating, among other considerations, the role of human oversight?

Reliability and Quality

Why it matters: Scholars, educators, professionals, and citizens, including medical doctors and patients, increasingly rely on GenAI technologies to gather and produce knowledge, to further science and benefit society.

Scholarly norm: The availability of rigorous editorial processes, such as peer review and ethical oversight. The provision of public corrections and retractions supports the reliability of research content.

Risks: Presenting inaccurate, fabricated, hallucinated, conflicting, or biased information will lead to false information undermining scientific and societal progress and could lead to direct harm in e.g., medical contexts.

Discussion topics:

• How can bias be mitigated, and fairness and impartiality be supported?
• How can quality and reliability be validated?
• How can hypotheses be differentiated from facts, and how can confidence or certainty levels be communicated?
• How can scholarly publishers help to prove the trustworthiness of GenAI technologies for scholarly communication and research?
• How can scholarly publishers assist in co-developing an evaluation framework specific to scholarly use?
• Would co-development of scholarly output validators be a useful endeavor?

Some of the discussion topics raised are not complicated to implement. For instance, standardizing citation formats, validating citations, or providing overall transparency in documentation is straightforward. Others, however, such as mechanisms to identify a first and original source or how to handle retractions and corrections, are more complex. The pursuit of trustworthy scientific discovery for the benefit of society, however, requires that we continue to search for solutions to these challenges, and with its experience and background, STM is uniquely positioned and eager to help answer these critical questions.

We welcome feedback on this brief and propose a dialogue between all relevant stakeholders, including developers and the research community, to further develop a vision on the responsible use of research content in or by GenAI tools.

With the understanding that any outcomes should be accompanied by mechanisms to enforce intellectual property rights and consequences for non-compliance, further conversations could cover:

• Content scope and presentation that meet the unique demands of scholarly research;
• Achieving attribution and citation integrity in GenAI outputs;
• Transparency around sources, processes, and limitations;
• Reliability, accuracy, and fairness in scholarly contexts

STM encourages the adoption of new technologies and is excited about the potential of GenAI tools to transform the discovery and use of scholarly information by scholars, educators, professionals, and the public. In support of this potential, a robust content licensing market already exists, enabled by current copyright and intellectual property frameworks that incentivize the creation of high-quality research content.

By working together, we can minimize risks, build enduring trust within the scholarly community, shape effective and balanced regulatory landscapes, and, ultimately, contribute to the proposition that GenAI tools accelerate scientific discovery in a responsible, ethical, trustworthy, and universally beneficial manner.

The International Association of Scientific, Technical & Medical Publishers (STM)
https://stm-assoc.org/

The Latest from STM

STM supports transparency in AI training

STM has expressed support for Congressional efforts to legislate on AI transparency, with several bills proposed to require AI developers to disclose the use of copyrighted material. The TRAIN Act grants rightsholders the ability to petition courts to subpoena developers to release generative AI training data. The CLEAR Act would require generative AI developers to disclose, available via a…

LEARN MORE

In the media | The Scholarly Kitchen: “Call for Feedback: STM Task & Finish Group (TFG) Image-type Taxonomy for Alt Text”

“The STM Association Alt Text Task & Finish Group (TFG) is excited to share its draft image-type taxonomy for scholarly images for comment and feedback. This taxonomy is a collaborative effort by members of the STM Association to develop a comprehensive classification system for images in scholarly publishing. In part, it is intended to assist authors…

LEARN MORE

Global reporting standard for AI disclosure in research: first consultation is open

Transparency about the use of generative Artificial Intelligence (AI) in research articles and other scholarly outputs is an important aspect of research integrity. At present, practices for  how  to disclose AI use vary widely across disciplines, regions, and publication cultures.  To address this issue, STM has released a report “Recommendations for a Classification of AI…

LEARN MORE

Making scholarly images more accessible: STM’s draft taxonomy now open for feedback

The STM Alt Text Task & Finish Group (TFG) has released a draft image-type taxonomy for scholarly images — and we’re inviting your feedback.  This draft is the result of a collaborative effort by STM members to support accessible publishing, with a focus on helping authors and publishers write better image descriptions (alt text). Not only does…

LEARN MORE