Data documentation and description
Why document my data?
Documenting and describing your data makes it easier for you and others to reuse data at a later date. Imagine that you were taking over a project in the middle of a grant, but could not contact the principal researcher. What information would you need to continue the project? Here are some examples:
- File handling (naming convention, folder structure)
- Processing steps (how to get from point A to B)
- Protocols (what decisions were made and why)
- Field abbreviations and/or name glossary (what does ABC3130 stand for)
This is what is called metadata, which is "data about data" or the who, what, when, where, why, how of your research.
- Who created the data
- What the data file contains
- When the data were generated
- Where the data were generated
- Why the data were generated
- How the data were generated
What do I document and describe?
It is important to begin documenting your data at the start of your research and to continue doing so throughout the project. If you create the documentation only at the end of the project, important details may be lost or forgotten.
There are three types of documentation for a research project: study-level metadata, variable-level metadata, and catalogue metadata.
Study-level metadata
Study-level metadata provides context for understanding why the data were collected and how they were used. It could include:
- Rationale and context for data collection
- Data collection methods (protocols, sampling design, instruments or software used, etc.)
- Structure and organization of data files
- Secondary data sources used
- Data validation and quality assurance (checking, proofing, cleaning, calibration, etc.)
- Transformations of data from the raw data through analysis
- Information on confidentiality, access and use conditions
Variable-level metadata
Variable-level metadata provides more granular information, as it explains, in detail, the data and dataset. It could include:
- Variable names, descriptions, units
- Data type (integer, Boolean, character, etc.)
- Explanation of codes and classification schemes used
- Data processing methods, software used, scripts, codes
- Data formats (.csv, .mat, .tiff, .txt, etc.) and software (including version) used
This information can be embedded in a data file. For example, variable, value and code labels can be added in an SPSS file. Interview transcripts can embed metadata in a header.
Further reading:
- Data documentation: Qualitative data (UK Data Service)
- Data documentation: Quantitative data (UK Data Service)
- Data documentation: Secondary sources (UK Data Service)
Catalogue metadata
When sharing data in a repository, the information added during data upload typically describes the content, context and provenance of the dataset(s) in a standardized and structured manner. This helps users find data, judge whether it is suitable for their research, and provides a bibliographic record for citing data.
The metadata in these data records often use international standards or schemes, consisting of mandatory and optional elements. Example schemes include Dublin Core (see also: Dublin Core Metadata Schema guide) or the Data Documentation initiative (DDI).
Example catalogue metadata could include:
- Name of the project
- Dataset title
- Project description
- Dataset abstract
- Principal investigator and collaborators
- Contact information
- Dataset handle (DOI or URL)
- Dataset citation
- Data publication date
- Geographic description
- Time period of data collection
- Subject/keywords
- Project sponsor
- Dataset usage rights
How do I document my data?
Documentation can take many forms. It can be written in free text, such as a README file, or the metadata can be captured in a structured, machine readable file, encoded using an XML format.
Structured, discipline specific metadata is preferable, but if no standard exists, writing README-style files are the most simple way of recording metadata.
README files
A README file provides information about a data file. It allows yourself and others to understand and reuse the data at a later date.
Best practices:
Follow Cornell Data Services' guide to writing READMEs for research data.
- Start writing the README files at the beginning of the research project.
- Record the information in a text file (.txt)
- Use a template to help guide you, but tailor it to the needs of the project and kind of data that is being documented. Template examples:
- Update the file as the research progresses.
- When the research is complete and ready to be shared, deposit the README file alongside the data in a repository.
Data dictionaries & codebooks
Data dictionaries and codebooks provide variable-level metadata. These two types of documents may provide overlapping information.
- Data dictionaries: describe the names, definitions, and attributes of the data elements in a file. Find out more:
- How to make a data dictionary (OSF)
- Describing your data with data dictionaries (Smithsonian Libraries)
- Data dictionaries (USGS)
- Codebooks: used by survey researchers to provide information about the data from a survey instrument. Further reading: Codebooks (Iowa University Libraries).
Lab notebooks
Lab notebooks (print or online) are also a great way to document your research. They include methodology, results, calculations, etc. They are helpful for publishing, sharing, or reproducing your research.
Information on choosing an electronic lab notebook:
- Electronic lab notebooks (Harvard University)
- Electronic research notebooks (Cambridge University)
Metadata standards
Find out if your discipline uses a metadata standard to describe data. In fact, specific disciplinary data repositories may require a formal standard. These metadata files are often saved in a machine readable format, such as XML. There are tools that can help with the creation of these metadata files. See the Tools section for more information.
To find an appropriate metadata standard for your discipline, consult the following resources:
- Disciplinary metadata guide (Digital Curation Center)
- Open directory of metadata standards (Research Data Alliance)
- Metadata standards catalog (Research Data Alliance)
Tools to document my data
Creating standardized metadata can be difficult and time consuming. There are tools that can help. Some help you select controlled vocabularies to include in your documentation. Others help you complete the metadata schema.
Stanford University Libraries provides a list of metadata tools that may be helpful.
Help and resources
Research data management consultations are available for Concordia faculty, students, and staff. Find out more about how librarians on the Library's RDM team can provide guidance. This service is part of Concordia's Institutional Research Data Management Strategy.