Recommendations for Authors

Recommendation 1 – Cite the data themselves

The citation should refer to the data specifically. Acknowledgements and data availability statements, as well as references to methodological reports or data articles, do not qualify as data citations. All data used or mentioned in a publication must be cited both in the text, often in a short version (e.g., author-date), and in the reference list in a full version.

Recommendation 2 – Include all six core components for data citation in the reference list

While citing the data, make sure that all the core components are included in the citation. These are the minimal components. The components are provided by dedicated data repositories where the data are published openly or with restricted access.

Arrange the citation core components and add additional components and information based on the requirements of the bibliographic style, journal/publisher guidelines, or data repository demands. While most reference styles accommodate authors, title, publication year, publisher, and the inclusion of a URL, certain components specific to data citations – such as the version (a core component) or resource type (an additional component) – may not be directly supported. In such cases, these details can be incorporated into the title field.

Data repositories usually provide a suggestion for the data citation, or even an automatic download of the core components that could be uploaded in a reference management software (e.g., Zotero, EndNote, Mendeley). Verify that the upload includes all six core components of the data citation. You might also want to include information related to additional components.

For embargoed data deposited in a repository, the deposit year and date can be used instead of the published year and version, respectively. Alternatively, “Forthcoming” can be mentioned instead of the published year. The data repository might be able to communicate the persistent identifier.

Recommendation 3 – Cite specifically primary and secondary data, as well as replication materials

Authors should cite the data they analyse, which can be data collected by themselves (primary data) or data collected by other researchers for another research purpose (secondary data).

In addition, authors might also be requested to share their replication materials, that is material for reproducibility purposes which may contain data, code, software, and other project files. In this case, both the (primary or secondary) data and the replication materials should be cited accurately and, if possible, separately. Indeed, in the interest of transparency and accountability, the authors using secondary data must cite the secondary data themselves in their publication. Citing only the replication materials that are authored by themselves is not sufficient in this case.

For authors using their own data (primary data), both the data and the replication materials have the same authors, as opposed to when reusing secondary data. However, these two research outputs contain different information and have different purposes. If the replication materials often include only the subset of data used in the analyses (e.g., some observations, variables, or other components), there is generally more data collected within the research project that could be shared in a data repository. Therefore, the data shared in the replication materials is mainly meant to verify and reproduce the results, while the ‘complete’ data could be reused in the future to answer new research questions. Consequently, it is recommended to publish and share data that are as complete data as possible in order to enable further reuse.