Skip to main content

Data Visualization

Tips, tricks and tools for visualizing data

What is Data Visualization?

Contemporary data visualization refers to "the use of computer-supported, interactive, visual representations of data to amplify cognition" (Card, Mackinlay & Schneiderman, 1999, p. 6).

Data visualizations "map data values to graphical features such as position, size, shape and color" (Heer, Bostock & Ogievetsky, 2010, p. 59) (also known as visual encodings). In doing so, data visualizations help us learn by appealing to the human visual system's ability to recognize patterns.

There are two kinds of data visualization:

  • Scientific visualization uses physical data, which often have obvious spatial mappings. For example, Hurricane Katrina's clouds can be represented as just that: clouds.

Hurricane Katrina IR clouds from GOES on 29 Aug 2005 at 00:15 GMT

Image credit: NASA/Goddard Space Flight Center Scientific Visualization Studio. More information at http://svs.gsfc.nasa.gov/3251

 

  • Information visualization uses abstract data (economic, education, etc.), for which spatial mappings are not as obvious. Consider Charles Joseph Minard's map of Napoleon's 1812 Russia campaign. The map shows the physical location of the troops and their direction of travel, as well as temperature and the number of troops. Minard's handling of abstract data helps the user understand the devastating effects of the Russian winter on Napoleon's army.

Charles Joseph Minard's Map of Napoleon's 1812 Russia Campaign

Image credit: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

Why Visualize?

Visualization provides an additional tool for exploring and analyzing datasets. Consider Anscombe's Quartet. Four datasets produce the same linear model.

 

I

II

III

IV

x

y

x

y

x

y

x

y

10.0

8.04

10.0

9.14

10.0

7.46

8.0

6.58

8.0

6.95

8.0

8.14

8.0

6.77

8.0

5.76

13.0

7.58

13.0

8.74

13.0

12.74

8.0

7.71

9.0

8.81

9.0

8.77

9.0

7.11

8.0

8.84

11.0

8.33

11.0

9.26

11.0

7.81

8.0

8.47

14.0

9.96

14.0

8.10

14.0

8.84

8.0

7.04

6.0

7.24

6.0

6.13

6.0

6.08

8.0

5.25

4.0

4.26

4.0

3.10

4.0

5.39

19.0

12.50

12.0

10.84

12.0

9.13

12.0

8.15

8.0

5.56

7.0

4.82

7.0

7.26

7.0

6.42

8.0

7.91

5.0

5.68

5.0

4.74

5.0

5.73

8.0

6.89

Anscombe's four datasets of coordinate pairs.

 

Property

Value

Mean of x

9

Sample variance of x

11

Mean of y

7.50

Sample variance of y

4.125

Correlation between x and y

0.816

Linear regression line

y = 3.00 + 0.500x

  Coefficient of determination of the linear regression  

0.67

Summary statistics of Anscombe's four datasets.

 

But when we visualize the datasets, we see just how different they are:Anscombe's quartet, graphed.

Image credit: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

 

Data visualizations have yielded critical insights.  Dr. John Snow used visualization to prove that an 1854 cholera outbreak was related to contaminated water. He mapped public wells and known cholera deaths around the Soho neighborhood, and was able demonstrate a cluster of deaths around the well at Broad and Cambridge Streets. Snow’s map also exemplifies the communicative power of visualization – showing engages the audience in a way that telling doesn’t always accomplish.

John Snow's Cholera Map

Image credit: https://www1.udel.edu/johnmack/frec682/cholera/

References

Anscombe, F.J. (1973). Graphs in statistical analysis. American Statistician, 27, 17-21.

Card, S.K., Mackinlay, J.D., & Scheiderman, B. (1999). Readings in information visualization: Using vision to think. San Francisco, CA: Morgan Kaufmann Publishers, Inc.

Heer, J., Bostock, M., Ogievetsky, V. (2010). A tour through the visualization zoo. Communications of the ACM, 53(6), 59-67.

Tufte, E. (1983). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.