As with any other resource (e.g. books, journal articles, etc.), it’s important to cite databases and/or data sets contributing to your research. Properly citing data gives creators credit for their work; helps track the impact of the data set; and facilitates data discovery and access.
In many cases, a data repository (such as ICPSR or Dryad) will provide recommended citation(s) for its datasets. You can simply copy and paste the citation(s) into your reference list.
If a data repository does not provide data citations, you can write your own citation. Not all style guides (i.e. MLA, Chicago) provide guidance in citing data. In such cases, it’s generally acceptable to cite data in the same way you would cite a research article according to that style guide. Regardless of the situation, try to include the following elements in your citation:
If you have your data’s DOI, you can use the DOI Citation Formatter to generate a reference.
Several style guides have specific instructions for data citation. Here are a few sample citations.
APA (6th Edition)
Pew Hispanic Center. (2004). Changing channels and crisscrossing cultures: A survey of Latinos on the news media [Data file and code book].
Retrieved from http://pewhispanic.org/datasets/
APSA (Revised 2006)
Purdue University. 2007. Controversial Facilities in Japan, 1955-1995 [computer file] (Study #4725). ICPSR04725-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2007. doi:10.3886/ICPSR04725.
NLM (2nd Edition)
Entrez Genome [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information. [date unknown]. Haloarcula marismortui ATCC 43049plasmid pNG200, complete sequence; [cited 2007 Feb 27]. Available from: http://www. ncbi.nlm.nih.gov/entrez/query.fcgi?db= genome&cmd=Retrieve&dopt=Overview&list_uids=18013