Skip to Main Content

Data Visualization

Tips, tricks and tools for visualizing data

What is Data Visualization?

Contemporary data visualization refers to "the use of computer-supported, interactive, visual representations of data to amplify cognition" (Card, Mackinlay & Schneiderman, 1999, p. 6).

"Data visualization is an umbrella term to cover all types of visual representations that support the exploration, examination, and communication of data. Whatever the representation, as long as it's visual, and whatever it represents, as long as it's information, this constitutes data visualization" (Few, 2009, p.12).

There are two kinds of data visualization:

  • Information visualization uses abstract data (economic, education, etc.), for which physical mappings are not as obvious. Consider Charles Joseph Minard's map of Napoleon's 1812 Russia campaign. The map shows the physical location of the troops and their direction of travel, as well as temperature and the number of troops. Minard's handling of abstract data helps the user understand the devastating effects of the Russian winter on Napoleon's army.

Charles Joseph Minard's Map of Napoleon's 1812 Russia Campaign

                                                 Image credit: https://en.wikipedia.org/wiki/Charles_Joseph_Minard

  • Scientific visualization is a visual representation of scientific data that are usually physical in nature, rather than abstract. For example, an MRI scan, an X-ray, or Hurricane Katrina's clouds can be represented as just that: clouds.

Hurricane Katrina IR clouds from GOES on 29 Aug 2005 at 00:15 GMT

                                                 Image credit: NASA http://svs.gsfc.nasa.gov/3251

Why Visualize?

Visualization provides an additional tool for exploring and analyzing datasets. Consider Anscombe's Quartet. Four datasets produce the same linear model.

 

I

II

III

IV

x

y

x

y

x

y

x

y

10.0

8.04

10.0

9.14

10.0

7.46

8.0

6.58

8.0

6.95

8.0

8.14

8.0

6.77

8.0

5.76

13.0

7.58

13.0

8.74

13.0

12.74

8.0

7.71

9.0

8.81

9.0

8.77

9.0

7.11

8.0

8.84

11.0

8.33

11.0

9.26

11.0

7.81

8.0

8.47

14.0

9.96

14.0

8.10

14.0

8.84

8.0

7.04

6.0

7.24

6.0

6.13

6.0

6.08

8.0

5.25

4.0

4.26

4.0

3.10

4.0

5.39

19.0

12.50

12.0

10.84

12.0

9.13

12.0

8.15

8.0

5.56

7.0

4.82

7.0

7.26

7.0

6.42

8.0

7.91

5.0

5.68

5.0

4.74

5.0

5.73

8.0

6.89

       Table: Anscombe's four datasets of coordinate pairs.

 

Property

Value

Mean of x

9

Sample variance of x

11

Mean of y

7.50

Sample variance of y

4.125

Correlation between x and y

0.816

Linear regression line

y = 3.00 + 0.500x

  Coefficient of determination of the linear regression  

0.67

Table: Summary statistics of Anscombe's four datasets.

 

But when we visualize the datasets, we see just how different they are:

Anscombe's quartet, graphed.

                                                 Image credit: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

References

Anscombe, F.J. (1973). Graphs in statistical analysis. American Statistician, 27, 17-21.

Card, S.K., Mackinlay, J.D., & Scheiderman, B. (1999). Readings in information visualization: Using vision to think. San Francisco, CA: Morgan Kaufmann Publishers, Inc.

Few, S. (2009). Now you see it : simple visualization techniques for quantitative analysis. Analytics Press.

Heer, J., Bostock, M., Ogievetsky, V. (2010). A tour through the visualization zoo. Communications of the ACM, 53(6), 59-67.