Data documentation and description

Why document my data?

Documenting and describing your data makes it easier for you and others to reuse data at a later date. Imagine that you were taking over a project in the middle of a grant, but could not contact the principal researcher. What information would you need to continue the project? Here are some examples:

File handling (naming convention, folder structure)
Processing steps (how to get from point A to B)
Protocols (what decisions were made and why)
Field abbreviations and/or name glossary (what does ABC3130 stand for)

This is what is called metadata, which is "data about data" or the who, what, when, where, why, how of your research.

Who created the data
What the data file contains
When the data were generated
Where the data were generated
Why the data were generated
How the data were generated

What do I document and describe?

It is important to begin documenting your data at the start of your research and to continue doing so throughout the project. If you create the documentation only at the end of the project, important details may be lost or forgotten.

There are three types of documentation for a research project: study-level metadata, variable-level metadata, and catalogue metadata.

Study-level metadata

Study-level metadata provides context for understanding why the data were collected and how they were used. It could include:

Rationale and context for data collection
Data collection methods (protocols, sampling design, instruments or software used, etc.)
Structure and organization of data files
Secondary data sources used
Data validation and quality assurance (checking, proofing, cleaning, calibration, etc.)
Transformations of data from the raw data through analysis
Information on confidentiality, access and use conditions

Variable-level metadata

Variable-level metadata provides more granular information, as it explains, in detail, the data and dataset. It could include:

Variable names, descriptions, units
Data type (integer, Boolean, character, etc.)
Explanation of codes and classification schemes used
Data processing methods, software used, scripts, codes
Data formats (.csv, .mat, .tiff, .txt, etc.) and software (including version) used

This information can be embedded in a data file. For example, variable, value and code labels can be added in an SPSS file. Interview transcripts can embed metadata in a header.

Further reading:

Data documentation: Qualitative data (UK Data Service)
Data documentation: Quantitative data (UK Data Service)
Data documentation: Secondary sources (UK Data Service)

Catalogue metadata

When sharing data in a repository, the information added during data upload typically describes the content, context and provenance of the dataset(s) in a standardized and structured manner. This helps users find data, judge whether it is suitable for their research, and provides a bibliographic record for citing data.

The metadata in these data records often use international standards or schemes, consisting of mandatory and optional elements. Example schemes include Dublin Core (see also: Dublin Core Metadata Schema guide) or the Data Documentation initiative (DDI).

Example catalogue metadata could include:

Name of the project
Dataset title
Project description
Dataset abstract
Principal investigator and collaborators
Contact information
Dataset handle (DOI or URL)
Dataset citation
Data publication date
Geographic description
Time period of data collection
Subject/keywords
Project sponsor
Dataset usage rights

How do I document my data?

Documentation can take many forms. It can be written in free text, such as a README file, or the metadata can be captured in a structured, machine readable file, encoded using an XML format.

Structured, discipline specific metadata is preferable, but if no standard exists, writing README-style files are the most simple way of recording metadata.

README files

A README file provides information about a data file. It allows yourself and others to understand and reuse the data at a later date.

Best practices:

Follow Cornell Data Services' guide to writing READMEs for research data.

Start writing the README files at the beginning of the research project.
Record the information in a text file (.txt)
Use a template to help guide you, but tailor it to the needs of the project and kind of data that is being documented. Template examples:
- Cornell University README template
- Oregon State University README template
Update the file as the research progresses.
When the research is complete and ready to be shared, deposit the README file alongside the data in a repository.

Data dictionaries & codebooks

Data dictionaries and codebooks provide variable-level metadata. These two types of documents may provide overlapping information.

Data dictionaries: describe the names, definitions, and attributes of the data elements in a file. Find out more:
- How to make a data dictionary (OSF)
- Describing your data with data dictionaries (Smithsonian Libraries)
- Data dictionaries (USGS)
Codebooks: used by survey researchers to provide information about the data from a survey instrument. Further reading: Codebooks (Iowa University Libraries).

Lab notebooks

Lab notebooks (print or online) are also a great way to document your research. They include methodology, results, calculations, etc. They are helpful for publishing, sharing, or reproducing your research.

Information on choosing an electronic lab notebook:

Electronic lab notebooks (Harvard University)
Electronic research notebooks (Cambridge University)

Metadata standards

Find out if your discipline uses a metadata standard to describe data. In fact, specific disciplinary data repositories may require a formal standard. These metadata files are often saved in a machine readable format, such as XML. There are tools that can help with the creation of these metadata files. See the Tools section for more information.

To find an appropriate metadata standard for your discipline, consult the following resources:

Disciplinary metadata guide (Digital Curation Center)
Open directory of metadata standards (Research Data Alliance)
Metadata standards catalog (Research Data Alliance)

Tools to document my data

Creating standardized metadata can be difficult and time consuming. There are tools that can help. Some help you select controlled vocabularies to include in your documentation. Others help you complete the metadata schema.

Stanford University Libraries provides a list of metadata tools that may be helpful.

Next: Data storage and backup

Help and resources

Research data management consultations are available for Concordia faculty, students, and staff. Find out more about how librarians on the Library's RDM team can provide guidance. This service is part of Concordia's Institutional Research Data Management Strategy.

Help Learning resources