Contemporary data visualization refers to "the use of computer-supported, interactive, visual representations of data to amplify cognition" (Card, Mackinlay & Schneiderman, 1999, p. 6).
Data visualizations "map data values to graphical features such as position, size, shape and color" (Heer, Bostock & Ogievetsky, 2010, p. 59) (also known as visual encodings). In doing so, data visualizations help us learn by appealing to the human visual system's ability to recognize patterns.
There are two kinds of data visualization:
Image credit: NASA/Goddard Space Flight Center Scientific Visualization Studio. More information at http://svs.gsfc.nasa.gov/3251
Image credit: https://en.wikipedia.org/wiki/Charles_Joseph_Minard
Visualization provides an additional tool for exploring and analyzing datasets. Consider Anscombe's Quartet. Four datasets produce the same linear model.
I |
II |
III |
IV |
||||
x |
y |
x |
y |
x |
y |
x |
y |
10.0 |
8.04 |
10.0 |
9.14 |
10.0 |
7.46 |
8.0 |
6.58 |
8.0 |
6.95 |
8.0 |
8.14 |
8.0 |
6.77 |
8.0 |
5.76 |
13.0 |
7.58 |
13.0 |
8.74 |
13.0 |
12.74 |
8.0 |
7.71 |
9.0 |
8.81 |
9.0 |
8.77 |
9.0 |
7.11 |
8.0 |
8.84 |
11.0 |
8.33 |
11.0 |
9.26 |
11.0 |
7.81 |
8.0 |
8.47 |
14.0 |
9.96 |
14.0 |
8.10 |
14.0 |
8.84 |
8.0 |
7.04 |
6.0 |
7.24 |
6.0 |
6.13 |
6.0 |
6.08 |
8.0 |
5.25 |
4.0 |
4.26 |
4.0 |
3.10 |
4.0 |
5.39 |
19.0 |
12.50 |
12.0 |
10.84 |
12.0 |
9.13 |
12.0 |
8.15 |
8.0 |
5.56 |
7.0 |
4.82 |
7.0 |
7.26 |
7.0 |
6.42 |
8.0 |
7.91 |
5.0 |
5.68 |
5.0 |
4.74 |
5.0 |
5.73 |
8.0 |
6.89 |
Anscombe's four datasets of coordinate pairs.
Property |
Value |
Mean of x |
9 |
Sample variance of x |
11 |
Mean of y |
7.50 |
Sample variance of y |
4.125 |
Correlation between x and y |
0.816 |
Linear regression line |
y = 3.00 + 0.500x |
Coefficient of determination of the linear regression |
0.67 |
Summary statistics of Anscombe's four datasets.
But when we visualize the datasets, we see just how different they are:
Image credit: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
Data visualizations have yielded critical insights. Dr. John Snow used visualization to prove that an 1854 cholera outbreak was related to contaminated water. He mapped public wells and known cholera deaths around the Soho neighborhood, and was able demonstrate a cluster of deaths around the well at Broad and Cambridge Streets. Snow’s map also exemplifies the communicative power of visualization – showing engages the audience in a way that telling doesn’t always accomplish.
Image credit: https://www1.udel.edu/johnmack/frec682/cholera/
Anscombe, F.J. (1973). Graphs in statistical analysis. American Statistician, 27, 17-21.
Card, S.K., Mackinlay, J.D., & Scheiderman, B. (1999). Readings in information visualization: Using vision to think. San Francisco, CA: Morgan Kaufmann Publishers, Inc.
Heer, J., Bostock, M., Ogievetsky, V. (2010). A tour through the visualization zoo. Communications of the ACM, 53(6), 59-67.
Tufte, E. (1983). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Dr. Martin Luther King, Jr. Library
One Washington Square | San José, CA 95192-0028 | 408-808-2000