A data citation is a reference to the data themselves and appears in-text or as a footnote whenever the data are either mentioned or directly referenced as a source, and the reference list should contain a corresponding entry (Bornatici & Fedrigo, 2023; FORCE11, 2014). The granularity of the reference depends on the usage. The data citation can identify the entire dataset or a subset.
To ensure proper identification and findability of the cited data, the data citation should comprise the following six core components (Bornatici & Fedrigo, 2023; Cousijn, H. et al. 2018; Finnish Committee for Research Data, 2018; IASSIST 2012; Jessop, 2021; Silvello, 2018; UK Data Service, 2023b):
- Data author(s): Name of each individual, research group, or organisational entity responsible for the creation of the data, sometimes referred to as data producer(s). This could also be the rights holder.
- Title of the data: Complete title of the data.
- Year of publication: Year the data were published or disseminated for the cited version of the data.
- Version: Version or edition number of the data. If not provided by the data publisher, the publication date or access date should be used instead.
- Data publisher: Organisational entity (e.g., data repository) responsible for making the data available by preserving and/or disseminating the data.
- Persistent identifier: Unique electronic identifier used to locate and access the data (such as a DOI).
- While the above components should always be present in a data citation, additional components might add value or be requested by the publisher’s/journal’s guidelines or bibliographic styles (Bornatici & Fedrigo, 2023; Finnish Committee for Research Data, 2018):
- Data number or accession number: Unique numerical identifier for the data, provided by the data publisher. The data number is not a persistent identifier.
- Resource type: General resource type, often provided after the title in square brackets, e.g., [dataset], [data file and documentation]. This allows instant differentiation of data citations from other resource types.
- Place of publication: Physical location of the data publisher.
For instance, following APA 7th edition (APA Style, 2020), the formal data citation is formatted as follows:
Data author(s) (Publication year). Data title (Data number; Version) [Resource type]. Data publisher. Persistent identifier
Here are two examples of formal data citation following APA 7th edition guidelines:
Vanhanen, T. (2019). Measures of Democracy 1810-2018 (FSD1289, Version 8.0) [Dataset]. Finnish Social Science Data Archive. https://doi.org/10.60686/t-fsd1289
ISSP Research Group (2023). International Social Survey Programme: Environment IV – ISSP 2020 (ZA7650; Version 2.0.0) [Dataset]. GESIS. https://doi.org/10.4232/1.14153