Skip to main content

Data Management

Resources in documenting, storing and preserving research data

File Format Recommendations

Material Preferred File Format
Tabular

ASCII or UTF-8 encoded,

.csv, .tsv

Geospatial

Formats compatible with widely adopted GIS (e.g. ArcGIS)*

Database

.sqlite, .db, .db3

Text

ASCII or UTF-8 encoded,

.html, .pdf, .xml

Archiving/Compression

.tar, .gzip, .zip

Still Images

.tiff, .jpg, .jp2, .png, .gif, .bmp, .pdf, .svg

Moving Images

.mov, .mpeg

Audio

.wave, .mp3

Websites

.warc

*http://www.loc.gov/preservation/resources/rfs/

Remember: Keep a Copy of Your Raw Data

In the course of research, we manipulate our data in order to test a hypothesis. During this process, we create new, actionable data products. Nonetheless, it’s important to always keep a copy of the original, raw data. This allows for your research to be replicable, and for you to backtrack should an error surface during data processing/analysis.

Versioning (saving new copies of a file when significant changes are made) is a good way to track the progression of file modifications. You can include such information in a file name.

  • Consider adding a version number to file names (e.g. “v1,” “v2” or “v2.1”)
  • It may be appropriate to append notes such as “draft” or “final,” so long as such files are of limited quantity (consider—labelling two different versions as “final” and “final2” would be confusing)
  • You might include information about the changes that were made (e.g. “normalized,” or “aggregated”)
For very complex versioning needs, consider using git. Although git is typically used for coding and software development projects, it can be used for any type of file. GitHub is an online repository for sharing project repositories.