Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Information Science and Data Analytics

Library sources and other research tools for information science and data analytics

Why Cite Data

As with any other resource (e.g. books, journal articles, etc.), it’s important to cite databases and/or data sets contributing to your research. Properly citing data gives creators credit for their work; helps track the impact of the data set; and facilitates data discovery and access.

In many cases, a data repository (such as ICPSR or Dryad) will provide recommended citation(s) for its datasets. You can copy and paste the citation(s) into your reference list. However, as with any reference management tool, always double check citations for accuracy.

If a data repository does not provide data citations, you can write your own citation. Not all style guides (i.e. MLA, Chicago) provide guidance in citing data. In such cases, it’s generally acceptable to cite data in the same way you would cite a research article according to that style guide. Regardless of the situation, try to include the following elements in your citation:

  • Creator or Author
  • Title of the Data Source
  • Date of Publication
  • Date of Access
  • Edition/Version Number
  • Format of the Data Source (e.g. [Computer File], [CD-ROM], [Online], etc.)
  • Distributor of the Data Source
  • Identifier or permanent URL for the Data Source

If you have your data’s DOI, you can use the DOI Citation Formatter to generate a reference.

Sample Data Citations

Several style guides have specific instructions for data citation. Here are a few sample citations.

APA (7th Edition)

Use the following templates to cite data sets in APA (7th Edition) :

Author, A. A. & Author, B. B. (Year). Title of data set (Version) [Data set]. Publisher Name.

Group author:

Name of Group. (Year). Title of data set [Unpublished raw data]. Source of Unpublished Data. https://xxxx

*Include a retrieval date only if the data set is designed to change over time.


Pew Hispanic Center. (2004). Changing channels and crisscrossing cultures: A survey of Latinos on the news media [Data file and code book].

APSA (Revised 2006)

Purdue University. 2007. Controversial Facilities in Japan, 1955-1995 [computer file] (Study #4725). ICPSR04725-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2007. doi:10.3886/ICPSR04725.

NLM (2nd Edition)

Entrez Genome [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information. [date unknown]. Haloarcula marismortui ATCC 43049plasmid pNG200, complete sequence; [cited 2007 Feb 27]. Available from: http://www. genome&cmd=Retrieve&dopt=Overview&list_uids=18013