As with any other resource (e.g. books, journal articles, etc.), it’s important to cite databases and/or data sets contributing to your research. Properly citing data gives creators credit for their work; helps track the impact of the data set; and facilitates data discovery and access.
In many cases, a data repository (such as ICPSR or Dryad) will provide recommended citation(s) for its datasets. You can copy and paste the citation(s) into your reference list. However, as with any reference management tool, always double check citations for accuracy.
If a data repository does not provide data citations, you can write your own citation. Not all style guides (i.e. MLA, Chicago) provide guidance in citing data. In such cases, it’s generally acceptable to cite data in the same way you would cite a research article according to that style guide. Regardless of the situation, try to include the following elements in your citation:
If you have your data’s DOI, you can use the DOI Citation Formatter to generate a reference.
Several style guides have specific instructions for data citation. Here are a few sample citations.
APA (7th Edition)
Use the following templates to cite data sets in APA (7th Edition) :
Author, A. A. & Author, B. B. (Year). Title of data set (Version) [Data set]. Publisher Name. https://doi.org/xxxx
Name of Group. (Year). Title of data set [Unpublished raw data]. Source of Unpublished Data. https://xxxx
*Include a retrieval date only if the data set is designed to change over time.
Pew Hispanic Center. (2004). Changing channels and crisscrossing cultures: A survey of Latinos on the news media [Data file and code book]. http://pewhispanic.org/datasets/
APSA (Revised 2006)
Purdue University. 2007. Controversial Facilities in Japan, 1955-1995 [computer file] (Study #4725). ICPSR04725-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2007. doi:10.3886/ICPSR04725.
NLM (2nd Edition)
Entrez Genome [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information. [date unknown]. Haloarcula marismortui ATCC 43049plasmid pNG200, complete sequence; [cited 2007 Feb 27]. Available from: http://www. ncbi.nlm.nih.gov/entrez/query.fcgi?db= genome&cmd=Retrieve&dopt=Overview&list_uids=18013