Document Data

Good data documentation is essential for research reproducibility and data reusability. Data documentation provides information about the context, the structure, the provenance and the content of a dataset (or a file) with the aim to increase its usefulness. Data documentation is therefore a crucial part of making data FAIR.

What Is Data Documentation?

Data documentation is also sometimes called metadata — data about data. Metadata describes basic characteristics of the data, such as:

Who created the data?
What does the data file contain?
When was the data generated?
Where was the data generated?
Why was the data generated?
How was the data generated?

Metadata or Data Documentation?

Metadata can either be maintained through a data archive/repository where you have to describe the characteristics of the data according to the information the repository requires from you. Alternatively, you can create a data documentation (README file), which contains additional information for the reuse of your data. As a rule, both are recommended: the information in the data repository is machine-readable and can thus be used for meta-analyses, while the README file facilitates the further use of the data by humans.

Start with data documentation when collecting the data.

How to Create Data Documentation?

Start your data documentation already when you collect your data. This will make it easier for you to track the complete data generation process later and will help you to create well-structured data documentation at the time of publishing.

Structure the documentation the first time: It is not necessary to have your data documentation fully structured right from the start. However, certain structures can help you gather all the metadata you need for your data to be reusable from the start.

The Stanford Libraries provide a good introduction to this.

Use metadata standards: Well-structured metadata or data documentation supports the long-term discoverability, understandability, and preservation of your research data. Discipline-specific repositories typically require highly structured metadata to enable highly granular searching of the repository.

Metadata Standards and Templates

Metadata standards are also referred to as "schemas". Schemas can be either generic or discipline-specific.

Well-known metadata standards include DublinCore — a set of 15 terms (such as creator, title, etc.). The Data Documentation Initiative (DDI) provides an XML-based schema for the content, transport, representation, and archiving of metadata in the social sciences. To find discipline-specific metadata schemas, look at:

The Metadata Standard Catalogue for scientific data
The list of metadata standards on Wikipedia, curated by the research community
Dataverse's metadata schemas, which can be exported to many other schemas.
The Digital Curation Center's list of metadata schemas.

Use templates to create your metadata.

Templates for creating data documentation can be found here:

Cornell University's README file is a Word document that asks the most important questions for comprehensive data documentation. From this, you can then generate a PDF and share it together with your data.

The CESSDA Metadata Schema allows you to capture project-level information about your data. To do this, answer the questions under "Project-level documentation".

The DataCite Metadata Generator creates XML-based data documentation for you based on the questions you answer in the generator.

Weiterführende Informationen

Christian Futter, Dr.
Elisabeth-Christine Gamer, Dr.
Melanie Röthlisberger, Dr.
Stefanie Strebel, Dr.

data@ub.uzh.ch

Tel. +41 44 635 47 49

We offer trainings, workshops and support for research data management and writing DMPs.

University Library Zurich

Quicklinks und Sprachwechsel

Main navigation