Skip to main content

Data Management

Resources in documenting, storing and preserving research data

Metadata

Metadata is documentation about data. Good metadata describes data in a way that allows all research team members (including those not involved in data collection) to understand the material at hand. Similarly, good metadata allows project non-affiliates to understand a dataset enough so as to re-use or replicate it. Metadata can range in formality from less formal to more formal. Your human resources, fiscal resources, domain and time, will affect the best combination of metadata tools for your research project.

Metadata should document both contextual information about the study, as well as data-specific information (also known as a codebook) about the study:

 

Contextual Information Data-Specific Information
  • Principal Investigator(s) & their contact information
  • Title (of dataset)
  • Date that dataset file was created
  • Date(s) of data collection/generation
  • Geographic location of data collection
  • Licensing (e.g. restrictions placed on dataset access/use, intellectual property, etc.)
  • Methods of data collection/generation
  • Methods used for data processing
  • Variable list
  • Units of measurement
  • Codes or symbols used to record missing data
  • Abbreviations and other conventions
  • Weighting

See ICPSR's Best Practices for Creating Metadata and Cornell's Guide to Writing "Readme" Style Metadata for more information.

Formal Metadata Standards

There are established metadata standards (both discipline-specific and generalized) which you can apply to your research data. Established standards typically employ structured, machine-readable and extensible syntax (such as XML) to annotate research data. The Digital Curation Centre provides a comprehensive catalog of metadata standards and accompanying tools to capture and/or store the metadata.

Readme Files

Because implementing a formal metadata standard can be resource-consuming (both in terms of time and personnel), and because existing metadata formats may not conform well to your research data, alternative documentation may be attractive.

One good option (particularly for internal use) is to create a readme file for each dataset. The readme should be a plain text file (.txt), and should separate important information with blank lines. As with formal metadata schemas, the readme will include both contextual information, and data-specific information, about your study.

Cornell University provides a downloadable readme file template, which may be customized according to your needs.