SJSU Research Guides: Data Management: 4. Describe Data

Why Should We Describe Our Data?

Describing your data thoroughly and at multiple levels is one of the most important data management practices you can do for both yourself and others.

Personal benefits

Clear, thorough documentation helps you understand your original work and processes months or years down the road.
If you publish your work, well-described and documented data help protect your work from retraction due to missing data or errors caused by poor data management practices.
Well-documented data is more usable to others. When others are able to reuse your data, they can also cite your data!

Benefits for others

Good documentation makes your data more useful to others.
Well-described data and clear documentation of your work also helps your work be reproducible and can help provide research integrity.

Description and documentation is also sometimes referred to as metadata. Metadata is information about your data or processes that helps provide context for understanding your research data. Metadata is important to have at the project or folder level as well as at the item or file level. Describing your data at multiple levels helps provide a more complete picture of how the data was produced, gathered, cleaned, and analyzed.

Metadata Best Practices

Describe Data at the Item Level

Clear and meaningful file names are a simple way to embed key information about your data in a file.
Create data dictionaries to describe your data
- A data dictionary is a central document describing the important information such as variable names, units of measurement, the range of valid values, and anything else others may need to know to interpret your data. The Open Science Framework provides a quick how-to.
Use metadata standards common in your discipline
- A metadata standard is a defined way to describe an object that ensures that you’re capturing the same information for each object and the information is structured the same way.
- In the social sciences, the Data Documentation Initiative (DDI) standard is often used as it was created to describe data produced by surveys and observational methods.
- In biology, a commonly used standard for describing biological diversity is Darwin Core.
- If you would like to view the standards used in different research domains, you can go to the Metadata Standards Catalog from the Research Data Alliance. The Digital Curation Centre also provides a comprehensive catalog of metadata standards and accompanying tools to capture and/or store the metadata.

Describe Data at the Project or Folder Level

A ReadMe file is a plain text file which provides overarching information about the project and files within the project. ReadMe style metadata is a simple, discipline-agnostic way to contextualize your data files. Best practice is to create a ReadMe file for each file, but this may not always be necessary. At minimum, a ReadMe file should be created for each project. For the sake of simplicity, ReadMe files are generally text (.txt) or Markdown files.

What goes in a README file?

Names and contact information for those associated with the project
Funding sources or institutional support
A list of files and folders, a description of their contents, and how to use them
Processing, analyses, or other important information to know about the data
Limitations of the data or project
Copyright and licensing information; citation preferences

Cornell University Library provides a guide on ReadMe files and a downloadable readme file template, which may be customized according to your needs.