Research Paper: Data Quality in Research
A framework of the roles and processes in research data quality, and areas of focus for the future.
Regardless of the field of study, sharing data is one of the most fundamental aspects of maintaining the integrity of research. The availability of research data plays a vital role in ensuring reproducibility and the ongoing development of Open Science.
A lot of progress has been made in the last years by the entire scholarly ecosystem. Nearly 100 million data citations are tracked and thousands of repositories have adopted data citation best practices. An equal number of journals have adopted data policies, data availability statements and have established persistent links between articles and datasets, and an increasing number of funders have adopted data policies. Data Management Plans are increasingly used, and institutes are supporting researchers with data software tools, data stewards, and launching institutional data repositories.
In the process, the challenges of how to collect, share and present data to a range of audiences must be integrated with questions as to whether the data is accurate and addresses the questions researchers wish to answer. We must better understand what constitutes quality data, how it can be ensured, and the processes and roles involved. Understanding of the importance of these issues emerged from group discussions at the STM Research Data Program (RDP), aimed at boosting the effective sharing, linking and citing of research data alongside publications. This Research Paper has developed from desk-based research and in consultation with leading experts, all of whom are involved with aspects of data quality across scholarly communications and research.
We hope this Research Paper offers a valuable perspective on the ongoing transition to data-dependent science. We also hope that our framework and the highlighted initiatives will inform current debate on how to ensure data quality.