Skip to main content

Manage data during research

Data documentation and description

Why document my data?

Documenting and describing your data makes it easier for you and others to reuse data at a later date. Imagine that you were taking over a project in the middle of a grant, but could not contact the principal researcher. What information would you need to continue the project? Here are some examples:

  • File handling (naming convention, folder structure)
  • Processing steps (how to get from point A to B)
  • Protocols (what decisions were made and why)
  • Field abbreviations and/or name glossary (what does ABC3130 stand for)

This is what is called metadata, which is "data about data" or the whowhatwhenwherewhyhow of your research.

  • Who created the data
  • What the data file contains
  • When the data were generated
  • Where the data were generated
  • Why the data were generated
  • How the data were generated

What do I document and describe?

It is important to begin documenting your data at the start of your research and to continue doing so throughout the project. If you create the documentation only at the end of the project, important details may be lost or forgotten.

There are three types of documentation for a research project: study-level metadata, variable-level metadata, and catalogue metadata.

Study-level metadata

Study-level metadata provides context for understanding why the data were collected and how they were used. It could include:

  • Rationale and context for data collection
  • Data collection methods (protocols, sampling design, instruments or software used, etc.)
  • Structure and organization of data files
  • Secondary data sources used
  • Data validation and quality assurance (checking, proofing, cleaning, calibration, etc.)
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access and use conditions

Variable-level metadata

Variable-level metadata provides more granular information, as it explains, in detail, the data and dataset. It could include:

  • Variable names, descriptions, units
  • Data type (integer, Boolean, character, etc.)
  • Explanation of codes and classification schemes used
  • Data processing methods, software used, scripts, codes
  • Data formats (.csv, .mat, .tiff, .txt, etc.) and software (including version) used

This information can be embedded in a data file. For example, variable, value and code labels can be added in an SPSS file. Interview transcripts can embed metadata in a header.

Further reading:

Catalogue metadata

When sharing data in a repository, the information added during data upload typically describes the content, context and provenance of the dataset(s) in a standardized and structured manner. This helps users find data, judge whether it is suitable for their research, and provides a bibliographic record for citing data.

The metadata in these data records often use international standards or schemes, consisting of mandatory and optional elements. Example schemes include Dublin Core (see also: Dublin Core Metadata Schema guide) or the Data Documentation initiative (DDI).

Example catalogue metadata could include:

  1. Name of the project
  2. Dataset title
  3. Project description
  4. Dataset abstract
  5. Principal investigator and collaborators
  6. Contact information
  7. Dataset handle (DOI or URL)
  8. Dataset citation
  9. Data publication date
  10. Geographic description
  11. Time period of data collection
  12. Subject/keywords
  13. Project sponsor
  14. Dataset usage rights

How do I document my data?

Documentation can take many forms. It can be written in free text, such as a README file, or the metadata can be captured in a structured, machine readable file, encoded using an XML format.

Structured, discipline specific metadata is preferable, but if no standard exists, writing README-style files are the most simple way of recording metadata.

README files

A README file provides information about a data file. It allows yourself and others to understand and reuse the data at a later date.

Best practices:

Follow Cornell Data Services' guide to writing READMEs for research data.

  • Start writing the README files at the beginning of the research project.
  • Record the information in a text file (.txt)
  • Use a template to help guide you, but tailor it to the needs of the project and kind of data that is being documented. Template examples:
  • Update the file as the research progresses.
  • When the research is complete and ready to be shared, deposit the README file alongside the data in a repository.

Data dictionaries & codebooks

Data dictionaries and codebooks provide variable-level metadata. These two types of documents may provide overlapping information.

Lab notebooks

Lab notebooks (print or online) are also a great way to document your research. They include methodology, results, calculations, etc. They are helpful for publishing, sharing, or reproducing your research.

Information on choosing an electronic lab notebook:

Metadata standards

Find out if your discipline uses a metadata standard to describe data. In fact, specific disciplinary data repositories may require a formal standard. These metadata files are often saved in a machine readable format, such as XML. There are tools that can help with the creation of these metadata files. See the Tools section for more information.

To find an appropriate metadata standard for your discipline, consult the following resources:

Tools to document my data

Creating standardized metadata can be difficult and time consuming. There are tools that can help. Some help you select controlled vocabularies to include in your documentation. Others help you complete the metadata schema.

Stanford University Libraries provides a list of metadata tools that may be helpful.

Help and resources

Research data management consultations are available for Concordia faculty, students, and staff. Find out more about how librarians on the Library's RDM team can provide guidance. This service is part of Concordia's Institutional Research Data Management Strategy.

Back to top

© Concordia University