Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Technology Resources

What is SPSS:

The Statistical Package for Social Sciences (SPSS) is software that enables researchers in many fields to manage their data and to perform statistical analysis of them. Market researchers, health researchers, education researchers, survey companies, sociologists, psychologists and others can use SPSS to obtain:

Descriptive statistics - frequency distributions, cross tabulation

Bi-variate statistics - t-test, ANOVA

Predictions - linear regression

And more

Terms

Dataset - an organized collection of data

Attributes – characteristics (of persons or things)

Variables – logical grouping of attributes (male → sex, baker → occupation)

Independent variable – values are taken as given, the cause

Dependent variable - depends on the independent variable

Getting Started

  1. Find the SPSS icon (Figure 1). Double click and wait for SPSS to get started.

 

Figure 1 SPSS Icon

Some versions of SPSS start off by a pop up window like the one in Figure 3. You can open a dataset through that dialogue or you can close it and SPSS will open a blank dataset.

  1. Click on close to open up a blank dataset.

Figure 2 IBM SPSS Intro Dialog

Toolbar

The Toolbar consists of two lines. The lower line is a selection of some of the functions you can also access through the top line. For example, Open Data Document that looks like a yellow paper folder and is located at the far left of the second line in the Toolbar allows you to open a data set to work with. You can also open a data set through File → Open → Data. “File” button is located on the far left on the top of the Toolbar.

Other operations that are located on the Toolbar allow you to edit, manage and analyze data, change what you can see on your screen, and get help.

Figure 3 Toolbar

File - Here you can open, data set, import data,  and save or rename your data set.

Edit - This tab allows you to edit your work. Here you can find functions such as undo, copy, paste, and, and replace.

View - Allows you to change the settings of what you can see on your screen.

Data - Here you can manipulate the entire data set; for example, you can split file, or sort cases.

Transform - Under his tab you will find functions that allow you to manipulate variables; recode, compute variables or replace missing values.

Analyze - You will use this tab to create reports about your data.

Direct Marketing - This tab is used to analyze customer information and consumer data.

Graph - You can visually display your data here by creating graphs and charts.

Utilities - This tab contains functions that help you compare datasets and perform data transformation.

Extensions - You can work with IBM SPSS extension bundles.

Window - Allows you to manipulate the settings of the SPSS window.

Help - Use this tab to access SPSS built-in help features.

Variable View and Data View

Variable View (Figure 4)

This view is useful for setting up variables and their properties.

Figure 4 Variable View

Under Name write a simple name of the variable without spaces.

Under Type define the type of the variable. Types are numeric, comma, dot, scientific notation, date, dollar, custom currency, or string. See Figure 5.

Figure 5 Variable Name and Variable Type

Width determines the number of characters allowed to use to define a variable.

If you would like to define the number of decimals for the variable, click of Decimal.

To label the variable, click on Label and type the text including spaces (up to 256 characters).

Variable values can be defined by clicking on the cell Values and then clicking on the ellipsis. A dialog box will appear. Label the value and click on Add before defining the next value. See Figure 6.

 

Figure 6 Variable Values

When you click on Missing, you can define which variables to exclude from analysis. Sometimes the questionnaire contains data that are not important for analysis; answers such as ‘not applicable’ or ‘I don’t know’ are often excluded.

Measure is where you define what level of measurement you are using for the variable. You can choose Scale, Ordinal or Nominal. Nominal measures are all at the same level; we cannot tell which one is more or less (sex, place of birth, etc.). Ordinal measure tells us the relationship between the cases. We can tell which one is more and which one is less, but we cannot tell by how much (Likert scale, levels of education, etc.). Scale is a measure that tells us the relationship between cases and allows us to tell “by how much” they are different (height, weight, income, etc.).

Data View

You can switch to Data View by clicking on the Data View tab on the bottom-left of the window. This is where you manually input your data, view or change your data.

Data Entry

You can use an existing data set and import it into SPSS. There are two ways you can import data.

  1. Right after you open SPSS a window pops up asking you which data set you would like to open. You can browse through your saved datasets and open the one you want to work with.
  2. If the window does not pop up (some versions of SPSS do not have this feature) you can always open a saved data set through the following path

File → Import Data → Excel (for example) → Choose the file you would like to work with

i) Click on File

ii) Select Import Data

iii) Select the type of data you are importing, for example Excel

iv) Chose the file from your computer that you would like to work with

You can also enter all your data directly into SPSS.

            First, set up all the variables that you will be working with in the Variable View. Typically, you would determine at least the Variable Name, Type, Width, Decimal and Label. You can always go back and revise.

            Switch to Data View.  Look at the bottom of your screen. You will see two tabs (Data View and Variable View) either on the left or in the middle. Click on Data View.  An empty sheet will appear with the names of the variables you set up earlier as headers of the columns and numbers indication the rows. (See Figure 7)

In Figure 5, there is a column that is labeled var and has a slightly lighter color than the rest of the columns. This indicates that there is still room for more variables but they have not been set up. This column is not going to be included in any analysis and should not contain any data.

Figure 7 Data View with Labeled Variables Age, Gender and ID

In the blank space below the variable names you can enter the data from each case. Make sure you enter data for one case on the same line (the same row). See Figure 8 for an example of a few cases  entered for variables ID, Gender and Age.

Figure 8 Data View with Data for Variables ID, Gender and Age

 

Frequency Tables and Value Labels

You are a health researcher. You conducted a survey of your patients where you asked about their marital status, highest level of education, weight and whether or not they smoke. You would like to know what proportion of your sample smokes, is married and what education level is the most frequently achieved among your respondents and what is the most likely weight of a person in this study. You received responses from 271 people. Your assistant entered the data into Excel and left notes on what the values he entered mean. 

In SPSS, you can view the distribution of your data using the Frequency Distribution command in Descriptive Statistics tab. Here is how you execute that:

Open SPSS, make sure you start off with a blank file and open the Excel dataset called Health. 

  1. A window will pop up where you can set the way the document will be transferred into SPSS.
  2. Leave all the settings the way they are. Click OK.
  3. There are five variables and 271 cases in the dataset.
  4. Make sure you are working in the Variable View and label the variables in the following way:

Marital = marital status

Edlevel = highest level of education

Weightrate = current weight

Smoke = do you smoke

ID will stay the way it is

  1. Then you will need to determine what the values within each variable stand for. You can do that under the tab Values.

The values for the variables we are working with are as follows:

Marital 1 single                                    Edlevel 1 primary school

            2 married                                              2 secondary school

            3 divorced                                             3 trade training/post-secondary training

            4 widowed                                             4 undergraduate degree

                                                                         5 postgraduate degree

Weightrate      1 very underweight                            Smoke  1 yes

                        10 very overweight                                        2 no

  1. For an example see Figure 1

Figure 1 Example of Labeling Values

  1. You will notice that when you are labeling the values for the variable Weightrate SPSS will not let you type the number 10 under the “Value.”
  2. Change the Width from 1 to 2 for this variable and you will be able to enter the label for value 10.
  3. Label the last value: 10 “very overweight”

Frequency distribution

One of the basic functions of SPSS is Frequency distribution. This function displays the data in a Frequency table and is useful for viewing how many cases or percent of our sample have selected each answer.

Follow this path to create a Frequency table for your data set:

Start at the top, in the toolbar. Select: Analyze → Descriptive Statistics → Frequencies (Figure 2)

  1. In the toolbar choose Analyze.
  2. Select Descriptive Statistics.
  3. Click on Frequencies
  4. A window will pop up where you will see a list of variables on the left and an empty column on the right.
  5. Chose the variable you want to work with. You can either double-click on its name, drag it to the right column or click on it once and then click on the arrow between the columns.
    1. Sometimes SPSS displays the variable labels as default.
    2. Hover over the left column and right-click.
    3. A dialogue window will pop up where you can determine how the variables will be displayed.
    4. Click on the bubble next to Display Variable Names
  1. All the variables you are working with should be now moved from the left column to the right column in the pop-up window. Include all variables except ID. DO NOT include the variable “ID.”
    1. Variable ID is on the left
    2. Variables Edlevel, Smoke, Marital and Weightrate are on the right
  2. Click OK
  3. A new window will pop up, labeled Output
  4. There will be four (4) frequency tables displaying the distribution of your data in the output window.
  5. Always make sure that you save the output as well as the data set

Figure 2 Path to Frequency Distribution

Output

Once you execute an operation in SPSS a separate window will open called Output. This is where all the reports, graphs and code will be displayed.

Figure 3 shows the output created by executing the Frequency Distribution for variables “marital,” “edlevel,” “weightrate,” and “smoke.”

Right at the top of the window you can see the code, where SPSS keeps log of actions.

            In this case it tells us where we got our dataset and that we ran a frequency distribution for the 

            above listed variables.

Next on the Output are the frequency distributions. Under the title “Frequencies,” there is a table labeled Statistics.

Statistics show the number of valid answers and the number of missing answers.

Frequency Table shows how the frequency and percentage of answers are distributed in the sample for each variable.

Figure 3 Reading the Output

In Figure 4 is the Frequency Distribution of Respondents’ Marital Status.  Under Frequency is the number of people who answered the particular option on the survey. Percent shows how many percent out of the total number on respondents that makes. Valid Percent shows how many percent, not including Missing Values that makes. Cumulative percent adds up the percentage by each line.

Figure 4 Frequency Distribution of Respondents’ Marital Status, shows that 19.9 percent of the respondents were single, 69.9 percent were married and 10.7 percent were either divorced or widowed. Most respondents were married.

Figure 4 Frequency Distribution of Respondents’ Marital Status

 

Pie Charts

  1. Open the dataset you want to work with
  2. Click on Graphs
  3. Select Legacy Dialogs
  4. Hit Pie (Figure 1)

Figure 1 Path To Creating a Pie Chart

  1. A new window will pop up (Figure 2)
  2. Click on the bubble next to Summaries for groups of cases
  3. Hit Define

Figure 2 Type of Pie Chart Dialog

  1. You will see a new window (Figure 3)

Figure 3 Define Pie Chart Window

  1. Select the variable you want to display in a pie chart
  2. Click on the arrow next to Define Slices by (Figure 4)
  3. Click on the bubble next to %of cases (Figure 4)
  4. If you want to add a title to your chart click on Titles in the top left corner (Figure 4)

Figure 4 Defined Pie Chart

  1. You can type the title, subtitle or footnotes in the window that pops up (Figure 5)

Figure 5 Pie Chart Title

  1. Click Continue
  2. The Titles window will disappear and you will see the window in Figure 4
  3. Click OK
  4. The pie chart will be displayed in the output (Figure 6)

Figure 6 Pie Chart in SPSS Output

  1. Double-click on the chart
  2. A chart editor window will show up (Figure 7)

Figure 7 Chart Editor

  1. Click on the Show Data Labels icon (Figure 8)
  2. If you hover over it the words Show Data Labels will show up

Figure 8 Show Data Labels Icon

  1. This will add labels with percentages to the pie chart (Figure 9)

Figure 9 Pie Chart with Labeled Data

  1. It will also open the Properties window (Figure 10)

Figure 10 Properties Window

  1. This window is for the data labels only. You can change the way the label is displayed, the size and style of the letters, the color of the field with the label (Figure 11)

Figure 11 Text Properties for Data Labels

  1. Click Apply
  2. The appearance of the chart will update in the chart editor (Figure 12)

Figure 12 Pie Chart with Customized Labels

  1. You can close the Properties window and double-click on other elements of the chart. This will open properties window for each element. You can play around with your chart and customize the colors and style (Figure 13)

Figure 13 Finished Pie Chart

  1. When you are happy with your pie chart close the chart editor
  2. The image will update in SPSS output

Histogram

  1. Open the dataset you want to work with
  2. Click on Graphs
  3. Select Legacy Dialogs
  4. Hit Pie (Figure 1)

Figure 1 Path to Creating a Histogram

  1. A new window will pop up (Figure 2)
  2. Select the variable you want to display in a histogram (make sure it is a continuous variable)
  3. Click on the arrow next to Variable

Figure 2 Histogram Popup Window

  1. If you want to display the Normal Curve on your histogram, don’t forget to check the box next to Display normal curve (Figure 3)

Figure 3 Display Normal Curve on Histogram

  1. If you wish to add a title to your histogram, click on Titles in the top right corner of the histogram popup window (Figure 2)
  2. A new window like the one in Figure 4 will appear
  3. Fill out the lines you want to have on your histogram and click Continue

Figure 4 Histogram Titles Window

  1. This will take you back to the Histogram window
  2. Click OK and your Histogram will appear in the Output (Figure 5)

  1. You can change the design of the histogram in a similar manner that you would change the design of a pie chart – for instructions, please see the Pie Charts tab, steps 18 - 29.

 

Creating an Index in SPSS

  • Choose variables that measure the same concept
  • Recode variables that are measuring the concept in a opposite way than you need or want
  • Transform --> Compute Variable
    • Name Target Variable
    • Add variables you want in the index using the + sign
    • Put parentheses around all terms
    • Divide by the number of terms
  1. Choose Transform from the toolbar
  2. Click on Compute Variables (Figure 1)

Figure 1 Path to Compute Variable

  1. A window like the one in Figure 2 will appear
  2. Type the name of the new variable that you are creating on the line labeled Target Variable in the top left corner (Figure 2)
  3. Switch to Display Variable names
  4. Double click on the variable you want to work with
  5. They will appear in the white box labeled Numeric Expression on the right (Figure 2)

Figure 2 Compute Variable Window

  1. Click on the + sign in the gray box below the Numeric Expression
  2. Double click on the next variable and so on until you have all variables you want in your index in the right box with + signs between them (Figure 3)

Figure 3 Addition of All Variables in the Index

 

  1. Put parentheses around all the terms – use your keyboard to do that (Figure 4)

Figure 4 Parentheses Around All Terms in Addition

  1. Divide all the terms by 5 (the number of items in the index). See Figure 5.

Figure 5 Division of All Terms

  1. Click OK
  2. The text “COMPUTE depressionINDEX=(CESD1+CESD2+CESD3R+CESD4+CESD5)/5.
    EXECUTE.” will appear in the Output window
      
  3. A new variable (depressionINDEX) will be listed at the end of the list of variables in the Variable View in the Data window
  4. Run a frequency distribution for the index (Figure 6)
  5. Check that values of the index are not smaller or larger than the values of the original variables

Figure 6 Frequency Distribution of depressionINXED

T-test: Definition

T-test helps us test our hypotheses about means. It determines whether there is a statistically significant difference between the means of two groups (Independent Samples T-test), between the means of the same sample at two different times (Paired Sample T-test), or between a mean of a group and a predicted value of that mean (One Sample T-test).

Assumptions

There are several assumptions we have to consider before we run a t-test:

Level of measurement:  The dependent variable must be continuous (interval/ratio).

Independence:               The observations are independent of one another.

Normality:                       The dependent variable should be approximately normally distributed.

Outliers:                          The dependent variable should not contain any outliers.

Check LINO:

It is always good to check if the data fits the assumptions outlined above.

Level of measurement:

The variable has to be measured continuously, that means that it can have any value within a certain range (weight, height, age). Non-continuous values are categorical values (nominal - place of birth; or ordinal - level of education) where the variable can assume only some values.

Independence:

It is hard to test for independence; however, if the sample was chosen randomly, there is very small chance that the data is biased (not independent).

Normality:

One of the assumptions we need to consider before running a t-test is that the dependent variable is approximately normally distributed. That means that the data distribution roughly follows the bell curve.

How to check normality in SPSS:

In SPSS, we can check if the data is normally distributed using a histogram with the normal curve. To do that follow this path:

Graps --> Legacy Dialogs --> Histogram

  1. Once you have opened the data file you want to work with (t-test), click on Graphs in the toolbar.
  2. Click on Legacy Dialogs.
  3. Select Histogram (see Figure 1).

 Figure 1 Path to creating a histogram

  1. A window such as the one in Figure 2 will appear.
  2. To check the distribution of a variable, select the variable you want to work with (in this case weightrate) and click on it once.
  3. Click on the blue arrow next to the word Variable. Weihtrate will appear in the field under it.

Figure 2 Histogram dialog

  1. Make sure to check the box next to Display normal curve.
  2. Press OK
  3. Figure 3 shows the SPSS output displaying the histogram representing the distribution of the data for the variable weightrate, including the outline of normal curve. The data is approximately normally distributed if the shape of the histogram roughly follows the normal curve. In this case, the data is normally distributed.


Figure 3 Distribution of weightrate data

Outliers

When performing a t-test it is important to identify and remove outliers. Outliers are data points that are way out of the expected range of responses, or abnormally far from the rest of the data.

Checking for outliers in SPSS:

  1. From the toolbar choose Analyze.
  2. Hover over Descriptive Statistics and click on Explore. (See Figure 4)

Figure 4 First steps to checking for outliers

  1. A window as the one in Figure 5 will appear.
  2. Choose the variable you are working for and click on it. We are using the variable weightrate in this example.
  3. Click on the blue arrow next to the window described as Dependent List. The name of the variable will appear in the box below the description. (See Figure 5)
  4. At the bottom of the window check the bubble next to Plots.
  5. Click on the button Plots on the right-hand side of the window in Figure 5.

Figure 5 Explore window

  1. A new window will pop up (see Figure 6).
  2. Under Boxplots check the bubble next to Factor levels together (see Figure 6).
  3. Under Descriptive check the box next to Histogram (see Figure 6).
  4. Click Continue. Explore: Plots window (Figure 6) will disappear.

Figure 6 Explore: Plots window

  1. You will see Explore window (Figure 5). Click OK.
  2. In your output window, you will see the same histogram as in Figure 3, only without the normal curve. Additionally, you will see a boxplot showing any outliers in the data (Figure 7).
  3. There are no outliers in the weightrate data. See figure 7.

Figure 7 Boxplot for weightrate

  1. Figure 8 shows a boxplot for another variable (from a different dataset) that does contain outliers.
  2. Outliers are signified by a circle and a number. In Figure 8 the number of the outlier is 52 and it is located at the level of 3 on the y-axis.

Figure 8 Boxplot with outliers

One Sample T-test

If we want to compare the mean of a sample against an assumed value, we use One Sample T-test.

The null hypothesis (H0) is that the sample mean (ȳ) and the assumed value (x) are equal.

The test hypothesis (H1) is that the sample mean (ȳ) and the assumed value (x) are not equal.

H0: ȳ = x

H1: ȳ ≠ x

Example:

A random sample of 25 eighth grade students has a GPA of 3.5 in English. The marks range from 1 (worst) to 5 (excellent). The GPA of all eighth grade students of the last five years is 3.7. Is the GPA of the 25 students different from the populations’ GPA?

In this case, students’ GPA, or sample mean, (ȳ) is compared with the assumed value of mean GPA from the past five years (x).

H0: ȳ = x

H1: ȳ ≠ x

Practice problem and procedure:

Problem:

You are a health science researcher and you want to know if the data your assistant collected are representative of the population you are working with. You know that the mean weightrate for your population is 6.5. How can you determine if the mean of the sample (6.31) is statistically different from the mean commonly cited in the peer reviewed literature (6.5)?

(Hint: You will run a one sample t-test)

Procedure in SPSS:

  1. Check the variable you want to work with for level of measurement, independence, normality and outliers (LINO); see steps to do so above. We will work with the variable weightrate. It is the same variable that I used as an example for checking for LINO. Once you know that the variable is continuous, approximately normally distributed, and that there are no outliers you can proceed with the t-test.
  2. From the toolbar choose Analyze.
  3. Select Compare Means and click on One-Sample T-Test (see Figure 1).

Figure 1 Path to One-Sample T-Test

  1. A new window will appear (Figure 2).
  2. Select the variable you want to work with (weightrate, in this case) and either double click on it or press the blue arrow pointing to Test Variable(s). The name of the variable will appear in the right-hand box. See Figure 2.
  3. Type the value you want to compare the mean on the variable to (in this example it is 6.5) in the box on the bottom of the window that says Test Value. See Figure 2.

Figure 2 One-Sample T-Test window

  1.  Click OK to execute the operation.
  2. The results of the test will be displayed in the output window (Figure 3)

Figure 3 One-Sample T-Test output

Reading T-Test Output:

  1. Remember the hypotheses from the beginning:

The null hypothesis (H0) is that the sample mean (ȳ) and the assumed value (x) are equal.

The test hypothesis (H1) is that the sample mean (ȳ) and the assumed value (x) are not equal.

H0: ȳ = x

H1: ȳ ≠ x

                  In this example:

                                    H0: ȳ = 6.5

H1: ȳ ≠ 6.5      

  1. Look at the t statistic .
  2. The critical value for this statistic is  ±1.96. If the obtained t statistic is smaller than -1.96 or larger than 1.96 (t < -1.96 ∪ t > 1.96), the difference of means is statistically significant for p=0.05.
  3. You can also look at the significance level .
  4. This value shows how sure we are that the result we got was not due to chance. If this number is low the probability of getting the result by chance is also low. Typically researchers work with p values 0.05 or 0.01. That meant that they want to be either 95% or 99% sure that the results they are getting are not due to chance.
  5. In this example, we can conclude that, based on One-Sample T-Test, the sample mean is statistically different from the mean found in literature. We reject the null hypothesis.

Independent Samples t-Test

The Independent Samples T-Test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different.

SPSS Procedure

Figure 1 Path To Independent Samples T-test in SPSS

  1. Select Analyze from to toolbar
  2. Hover over Compare Means
  3. Click on Independent-Samples T test in the list that appears
  4. A window such as the one in Figure 2 will pop up

Figure 2 Independent Samples T-Test Popup

  1. Select the variable you want to test (independent variable) - make sure this is a continuous variable
  2. Click on the arrow next to the window labeled Test Variables (Figure 3)

Figure 3 Test Variable

  1. The independent variable will appear in the box on the right
  2. Choose the grouping variable (sex, for example) and click on the arrow next to the label Grouping Variable (Figure 4)

Figure 4 Choosing Grouping Variable

  1. The variable name and two question marks will appear in the line under the label
  2. Click Define Groups 
  3. A new window such as the one in Figure 5 will pop up

Figure 5 T-Test Define Groups

  1.  For Group 1 enter the number 1 (Figure 5)
  2. For Group 2 enter the number 2 (Figure 5)
  3. Click continue; the window will disappear
  4. Click OK in the next window
  5. The result of the T-test will be displayed in the output

Figure 6 SPSS Output for Independent T-Test

  1. In the output focus on the t statistic (t) and significance level (Sig.)
  2. If Levene's Test For Equality of Variances is significant, read the second line 
  3. In this case Levene's Test is not significant, so we can read the first line of the T-Test
  4. The t-statistic in -0.574 and p-value is 0.566; therefore, there is no statistically significant difference between men and women in total years of education. 

ANOVA Assumptions

There are several assumptions we have to consider before we run an ANOVA:

Level of measurement:  The dependent variable must be continuous (interval/ratio)

Independence:               The observations are independent of one another

Normality:                       The dependent variable should be approximately normally

                                          distributed.

Outliers:                          The dependent variable should not contain any outliers

Variance:                           The variances of the samples should be homogeneous

Check:

It is always good to check if the data fits the assumptions outlined above.

Level of measurement:

The variable has to be measured continuously, that means that it can have any value within a certain range (weight, height, age). Non-continuous values are categorical values (nominal - place of birth; or ordinal - level of education) where the variable can assume only some values.

Independence:

It is hard to test for independence; however, if the sample was chosen randomly, there is very small chance that the data is biased (not independent).

Normality:

One of the assumptions we need to consider before running a t-test is that the dependent variable is approximately normally distributed. That means that the data distribution roughly follows the bell curve.

How to check normality in SPSS:

In SPSS, we can check if the data is normally distributed using a histogram with the normal curve. To do that follow this path:

Graphs --> Legacy Dialogs --> Histogram

  1. Once you have opened the data file you want to work with (t-test), click on Graphs in the toolbar.
  2. Click on Legacy Dialogs.
  3. Select Histogram (see Figure 1).

 Figure 1 Path to creating a histogram

  1. A window such as the one in Figure 2 will appear.
  2. To check the distribution of a variable, select the variable you want to work with (in this case weightrate) and click on it once.
  3. Click on the blue arrow next to the word Variable. Weihtrate will appear in the field under it.

Figure 2 Histogram dialogue

  1. Make sure to check the box next to Display normal curve.
  2. Press OK
  3. Figure 3 shows the SPSS output displaying the histogram representing the distribution of the data for the variable weightrate, including the outline of normal curve. The data is approximately normally distributed if the shape of the histogram roughly follows the normal curve. In this case, the data is normally distributed.


Figure 3 Distribution of weightrate data

Outliers

When performing a t-test it is important to identify and remove outliers. Outliers are data points that are way out of the expected range of responses, or abnormally far from the rest of the data.

Checking for outliers in SPSS:

  1. From the toolbar choose Analyze.
  2. Hover over Descriptive Statistics and click on Explore. (See Figure 4)

Figure 4 First steps to checking for outliers

  1. A window as the one in Figure 5 will appear.
  2. Choose the variable you are working for and click on it. We are using the variable weightrate in this example.
  3. Click on the blue arrow next to the window described as Dependent List. The name of the variable will appear in the box below the description. (See Figure 5)
  4. At the bottom of the window check the bubble next to Plots.
  5. Click on the button Plots on the right-hand side of the window in Figure 5.

Figure 5 Explore window

  1. A new window will pop up (see Figure 6).
  2. Under Boxplots check the bubble next to Factor levels together (see Figure 6).
  3. Under Descriptive check the box next to Histogram (see Figure 6).
  4. Click Continue. Explore: Plots window (Figure 6) will disappear.

Figure 6 Explore: Plots window

  1. You will see Explore window (Figure 5). Click OK.
  2. In your output window, you will see the same histogram as in Figure 3, only without the normal curve. Additionally, you will see a boxplot showing any outliers in the data (Figure 7).
  3. There are no outliers in the weightrate data. See figure 7.

Figure 7 Boxplot for weightrate

  1. Figure 8 shows a boxplot for another variable (from a different dataset) that does contain outliers.
  2. Outliers are signified by a circle and a number. In Figure 8 the number of the outlier is 52 and it is located at the level of 3 on the y-axis.

Figure 8 Boxplot with outliers

Variance

To test if the variances of the samples are homogeneous you need to conduct Levene’s Test. The test of homogeneity can be included in One-Way ANOVA. See instructions on how to do that under the ANOVA tab.

One-Way ANOVA in SPSS

  1. Choose a nominal or ordinal variable that has 2 or more categories (independent variable)
  2. Choose a continuous variable as a dependent variable
  3. Go to Analyze
  4. Hover over Compare Means
  5. Select One-Way ANOVA (Figure 1)

Figure 1 Path to One-Way ANOVA

  1. A window like the one in Figure 2 will pop up

Figure 2 One-Way ANOVA Window

  1. Locate the dependent variable and click on the arrow next to Dependent List (Figure 2)
  2. Find the independent variable and click on the arrow next to Factor (Figure 2)
  3. Click on Options (Figure 2)
  4. A new window will show up (Figure 3)

Figure 3 One-Way ANOVA Options Window

  1. Check the box next to Homogeneity of variance test
    1. This is the test of equality of variances – Levene’s Test
  2.  Hit Continue and the Options window will disappear

Figure 4 One-Way ANOVA Dialog

  1. When you are back on the previous window, click on Post Hoc (Figure 4)
  2. A new window will pop up (Figure 5)

Figure 5 One-Way ANOVA Post Hoc Multiple Comparisons Window

  1. Check the box next to Tukey (Figure 5)
  2. Hit Continue and the window will disappear
  3. When you are back in the One-Way ANOVA dialog (Figure 4), click OK

Reading ANOVA Output

Once you execute the One-Way ANOVA command, SPSS will produce the results of the analysis in the Output window.

  1. The first table in the output is the analysis of homogeneity of variances – Levene’s Test (Figure 6)
  2. If the test is significant (Sig. column value is smaller or equal to 0.05), variances are not equal

Figure 6 Test of Homogeneity

  1. The next table is ANOVA, where the results of the analysis of variance are (Figure 7)
  2. If the significance of the test is smaller or equal to 0.05 (p <= 0.05), there is a statistically significant difference of means of the dependent variable among the various categories of the independent variable.

Figure 7 ANOVA 

The Post Hoc Test (Figure 8) shows which means are different from all the other means (Figure 8)

Figure 8 Post Hoc Test - Tukey

Contingency Tables

Crosstab – also known as cross tabulation or contingency table is a way to display data. It is a frequency distribution of two variables, one on rows and another in columns of a matrix. When we display data this way we can see basic relationship between the variables.

Example

Table 1 Recoded Shotgun in Home and Respondents’ Sex Crosstabulation (a_shotgun, sex)

Respondents’ Sex

Shotgun in Home

Male

Female

Total

Yes

193

24.1%

153

15.0%

346

19.0%

No

609

75.9%

868

85.0%

1477

81.0%

Total

802

100%

1021

100%

1823

100%

Source: General Social Survey 2016

n= 2867

Table 1 shows the frequency distribution of the variable sex and the variable shotgun. The attributes of the variable sex are displayed in columns and the attributes or the variable a_shotgun (the answers to the question: “Do you happen to have in your home any guns or revolvers?”) are in rows. If we want to know if there is any relationship between respondents’ sex and possession of shotgun in respondent’s home we can compare the percentage in the cell of interest to the percentage if the corresponding total (marginal). Notice that there are two Totals, one in the last row and one in the last column. These totals correspond to the row or column they end. The total of all respondents is in the cell all the way on the bottom right (1823). The total of all respondents is lower that the total number of respondents in the survey (2867). That means that they either refused to answer that question, they didn’t know or the question was not applicable to them.

Respondents’ sex is the independent variable and shotgun is the dependent variable. That is why sex is displayed in columns and shotgun in rows.

If there were no relationship between sex and shotgun in home the percentages in the same row would be approximately the same. In Table 1 the percentage of male respondents who have a shotgun in their home (24.1) is around 5 percent higher than the corresponding marginal (19). The percentage of female respondent who have a shotgun (15) in their homes is 4 percent lower than the corresponding marginal (19). Looking at the crosstab we can suspect that sex influences the likelihood of respondents having a shotgun in their homes.

Crosstabs in SPSS

To create a crosstab in SPSS, follow the path Analyze à Descriptive Statistics à Crosstabs. Put independent variable in columns and dependent variable in rows. Click on Cells and under Percentages select Columns. Hit Continue and OK.

  1. Click on Analyze in the Toolbar
  2. Select Descriptive Statistics
  3. Under Descriptive Statistics choose Crosstabs
  4. A window like the one in Figure 1 will show up

Figure 1 Cosstabs Window

  1. You will notice that the variables are hard to recognize because SPSS displays variable labels rather than variable names as default. If you already know how to display variable names and sort variables alphabetically skip to step 11
  2. Hover over the white field with the list of variables and right-click anywhere in the field
  3. A submenu like the one in Figure 2 will pop up
  4. Click on the bubble next to Display Variable Names
  5. Repeat steps 6 and 7
  6. Click on the bubble next to Sort Alphabetically

Figure 2 Display Variable Names and Sort Alphabetically Function

  1. Select the independent variable and click on the blue arrow next to panel that is labeled Columns
  2. Select the dependent variable and clink on the blue arrow next to the panel that is labeled Rows (Figure 3)

Figure 3 Independent and Dependent Variable in Crosstabs Window

  1. If you click OK now your table will only display frequencies and not percentages. It will be difficult to make any conclusions about your data
  2. Find the button that says Cells and click on it
  3. A window like the one in Figure 4 will pop up

Figure 4 Crosstabs – Cell Display

  1. Check the square next to the word Column in the middle panel on the left titled Percentages (Figure 4)
  2. Click Continue and the window will disappear
  3. You are back on the previous window; click OK
  4. You will see a table in the output window (Figure 5)

Figure 5 SPSS Output – Crosstabulation of sex and a_shotgun

Chi Square

If we want to find out whether there is a statistically significant relationship between two variables, we conduct a statistical test for association – Chi Square.

Assumptions:

  1. Both variables are measured at nominal or ordinal level
  2. Both variables consist of two or more independent groups

Chi Square in SPSS:

To create a crosstab and run the Chi Square test in SPSS, follow the path Analyze --> Descriptive Statistics --> Crosstabs. Put independent variable in columns and dependent variable in rows. Click on Cells and under Percentages select Columns. Hit Continue and OK. Click on Statistics and select Chi Square. Click Continue and in the next window hit OK.

  1. Open the dataset you want to work with in SPSS
  2. Click on Analyze in the Toolbar
  3. Select Descriptive Statistics
  4. Under Descriptive Statistics choose Crosstabs
  5. Select the independent variable and click on the blue arrow next to panel that is labeled Columns
  6. Select the dependent variable and clink on the blue arrow next to the panel that is labeled Rows
  7. Find the button that says Cells and click on it
  8. Check the square next to the word Column in the middle panel on the left titled Percentages
  9. Click Continue and the window will disappear
  10. You are back on the previous window; click Statistics
  11. A window like the one in Figure 1 will show up

Figure 1 Crosstabs – Statistics Window

  1. Check the square next to Chi Square
  2. Click Continue and the window will disappear
  3. You are back on the previous window; click OK
  4. Two tables will be produced in the output window (Figure 2)

Figure 2 SPSS Output for Chi Square Including a Crosstab

SPSS Output – Chi Square

Figure 7 shows the output that will be produced by SPSS when a Chi Square test is conducted.

The top table is the crosstabulation of the two variables that are in the analysis. Independent variable is in columns and dependent variable in in rows.

The table on the bottom of Figure 2 is the information related to the Chi Square test.

Figure 3 is a close up of the table showing the Chi Square statistic, degrees of freedom and level of significance P.

Figure 3 Chi Square Test Statistic

In rows:

Pearson Chi Square – Statistical test for association

In columns:

Value – the value of Chi Square; if this value is larger than the critical value of Chi Square for the particular degrees of freedom there is support for the alternative hypothesis that these is association between the independent and dependent variable. If the value is lower than the critical value we fail to reject the null hypothesis (H0) and there is no association between the independent and dependent variable.

df – degrees of freedom, the value of degrees of freedom is calculated as (the number of columns – 1) times (the number of rows – 1)

df = (C-1) (R-1)

Asymptotic Significance (2-sided) – this is the p-value that is compared to a (alpha); if p-value is smaller than alpha we reject the null hypothesis. If the p-value is larger than alpha we fail to reject the null hypothesis.

                  p < a  --> reject H0

                  p > a  --> fail to reject H0

Linear Regression

Assumptions

Linear relationship between the continuous dependent and independent variables in the model.

How to check

  1. Select the two variables that you want to work with (MNTLHLTH - days of poor mental health in past 30 days, AGE – respondent’s age)
  2. From the toolbar select Graphs and click on Scatter/Dot (Figure 1)

Figure 1 Path to Scatter/Dot Plot

  1. Click on Simple Scatter (Figure 2)
  2. Hit Define (Figure 2)

Figure 2 Scatter/Dot Plot Options Window

  1. Put one variable on the Y Axis (Days of poor mental health). See Figure 3.
  2. Put the other variable on the X Axis (age)
  3. Click on OK

Figure 3 Simple Scatterplot Settings Window

  1. SPSS Output will produce the graph that shows the relationship between the two variables
  2. It is often hard to determine the relationship from this graph, you can ask SPSS to add a line of best fit (Figure 4)

Figure 4 Scatter Dot Plot in SPSS Output

  1. Double click on the graph field in SPSS Output
  2. A Chart Editor will pop up
  3. Click on the icon that says Add Fit Line at Total when you hover over it (Figure 5)

Figure 5 Add Line of Best Fit Icon

  1. SPSS will fit a line through the dots (Figure 6)
  2. Close the Chart Editor and SPSS Output will update to reflect the changes

Figure 6 Scatter/Dot Plot with Line of Best Fit in Chart Editor

SPSS Procedure

  1. Think about the variables in your dataset that may influence the dependent variable
    1. Dependent variable must be continuous
    2. Independent variables should be also continuous; however, it is possible to include nominal variables such as sex or race in the regression model
      1. If you want to do that, you have to make sure these are dummy variables
  2. From the toolbar select Analyze
  3. Hover over Regress
  4. Click on Linear (Figure 7)

Figure 7 Path to Linear Regression

  1. A Linear Regression window will pop up (Figure 8)

Figure 8 Linear Regression Settings Window

  1. Select the dependent variable and click on the blue arrow next to the field titled Dependent
  2. Choose the independent variables and put the m in the Independent(s) list by clicking on the blue arrow next to that field (Figure 9)
  3. Make sure Method is set to Enter (Figure 9)
  4. Click OK

Figure 9 Linear Regression with Dependent Variable and a List of Independent Variables

SPSS Output

The variables used in this analysis are:

                  Dependent Variable: MNTLHLTH – Number of work days missed in the past 30 days due to poor

                                                                                               mental health

                  Independent Variables: AGE – Respondent’s age in years

                                                                        s_2 - Dummy for Sex – Male

                                                                         r_2 - Dummy for Race – Black

                                                                         r_3 - Dummy for Race – Other

                                                                        MOREDAYS – Number of days worked extra in the past 30 days

Variables Entered/Removed Box

This box specifies the number of the model and the variables that were entered in the model or removed from it. Variables are typically removed when you choose another method than Enter. You can specify criteria for keeping/removing variables if you want to run a stepwise regression, for example. 

Table 1 Linear Regression Output - Variables Entered and Removed

Model Summary

Model summary box again specifies the number of the model and provides R, R square, adjusted R square and the root mean square error for each model. 

Table 2 Linear Regression Output – Model Summary

R is the square root of R square and shows the correlation between the observed and predicted values of the dependent variable. 

R square shows how much of the variance of the dependent variable can be explained by the independent variables in the model. 

Adjusted R square reduces inflation of R square when adding more variables in the model. By adding more variables R square can increase simply due to chance.

Standard Error of the Estimate is the standard deviation of the error. It measures the accuracy of the prediction. Smaller standard error indicates more accurate prediction.

ANOVA

This table shows the sum of squares, degrees of freedom, mean square, F, and p-value.

Table 3 Linear Regression Output – ANOVA

F and Sig. help answer the question: “Do the independent variables reliably predict the dependent variable?” If Sig. (p-value) is smaller than 0.05, the answer is: “Yes, they do”.

Parameter Estimates

For each model, there are several statistics in the Coefficients table.

Table 4 Linear Regression Output - Coefficients

Model column lists the predictor variables for each model, including the constant (Y-intercept). The Y-intercept shows the value of the dependent variable when all the predictors are held at 0.

Unstandardized Coefficients - B - measure the effect of the increase of the independent variable by one unit on the dependent variable; the coefficient is unstandardized because it is measured in its natural units. That means that we cannot tell which predictor is more influential.

- Std. Error – shows the standard error of the coefficient and helps determine if the coefficient is significantly different from 0, form a confidence interval, and calculate the t value.

Standardized Coefficients Beta can be compared to each other because all the variables in the model have been standardized before running the regression.

Sig. shows the p-value for the hypothesis that the coefficients are not different from 0 (2-tailed test). If the value in the Sig. column is lower than alpha (typically 0.05), the coefficients are different from 0.

Transforming Variables

Sometimes variables have more attributes than we need for our analysis or are assigned a numeric value that does not make sense for our analysis. In such cases, we can reduce the number of attributes or assign a different numeric value to the attributes by recoding the variable.

Check Attributes

If we want to recode the attributes of the variable, we need to know how they are coded in the first place. You can either check the coding scheme in the codebook that should be available with the dataset that you are using or you can do see how the attributes were coded under Values in the Variable View in SPSS.

SPSS Procedure:

  1. Make sure that you are in Variable View of your Data window
  2. Find the variable you want to work with
  3. Click into to the Values cell on the same line as your variable (see Figure 1)

Figure 1 Where to Check the Coding Scheme of a Variable

  1. Click on the ellipsis (…) next to the text in the cell
  2. A window like the one in Figure 2 will pop up

Figure 2 Coding Scheme for Respondents’ Marital Status

  1. Read and record the coding scheme
    1. In this case, the possible answers to the question asked, “What is your marital status?”, are: single, married, divorced or widowed. Whenever there is number 1 in the data it represents the attribute “Single”. Number two represents the attribute “Married”, and so on.

Recoding Variables

Let’s say that for our analysis, we are only interested to know whether or not people are married. The marital variable contains “too much” information. We need to make sure that all those who answered single, divorced or widowed are grouped together in a new category “not married”. The Recode into Different Variables is useful here.

SPSS Procedure:

Follow the path: Transform -> Recode into Different Variables

  1. Select Transform from the Tool Bar
  2. Click on Recode into Different Variables (Figure 3)
    1. If you click on Recode into Same Variables you will lose the original data and that is often not a good idea. You might want to retain the original (unrecoded) variable in case you made a mistake and need to repeat the process, or if you need it for a different kind of analysis later. Once you recode the variable, you cannot reverse what you have done.

Figure 3 Path to Recode into Different Variables Command

  1.  A window like the one in Figure 4 will show up

Figure 4 Recode into Different Variables Pop-Up Window

Setting up a new variable

  1. You will notice that the variables are hard to recognize because SPSS displays variable labels rather than variable names as default.
  2. Hover over the white field with the list of variables and right-click anywhere in the field.
  3. A submenu like the one in Figure 5 will pop up.
  4. Click on the bubble next to Display Variable Names
  5. Repeat steps 5 and 6
  6. Click on the bubble next to Sort Alphabetically

Figure 5 Display Variable Names and Sort Alphabetically Function

  1. Choose the variable you want to recode (marital)
  2. Double-click on the variable name or click on the blue arrow
  3. The variable name will appear in the white window on the right (Figure 6)

Figure 6 Name and Label New Variable

  1. Type the name of the new variable (the one you are creating) under Name in the blue panel all the way on the right labeled Output Variable
    1. I typically use the original variable name with R, signifying that the variable is recoded (maritalR)
    2. Naming of the output variable follows the same rules as naming variables in the Variable View of the dataset
      1. No spaces
      2. Restrictions on special characters
  2. Under Label, you can give the new variable a label
    1. I usually use the same label as the original variable but start or end with the word recoded to signify that it is not the same variable as the original one
    2. If you are recoding the same variable multiple different ways, it is good practice to indicate which one of the versions of recoding this is
  3. Click on Change
  4. Click on Old and New Values
  5. A new window will pop up (Figure 7)

Figure 7 Recode into Different Variables - Old and New Values Pop-Up Window

Recoding values

In this window (Figure 7) you indicate how you want the values to be transformed. The left panel is where you will input the old, or original, values. The top right panel is where you input what you want the new values to be. You can see what SPSS will do in the white panel on bottom right after you have set one command up and clicked Add.

  1. Decide how you want to change the values
    1. Recall the coding scheme of the marital variable:

1 = single, 2 = married, 3 = divorced and 4 = widowed

  1. We are only interested in knowing whether people are married or not married. That means that values 1, 3 and 4 need to be grouped together.
  2. In this case it is the easiest to code not married as 1 and married as 2:

Old value                             New value

1 single                              1 not married

2 married                           2 married

3 divorced                          1 not married

4 widowed                          1 not married

  1. We need to change the values 3 and 4 into 1. To do that use the Range option
  2. Click on the bubble next to the word Range in the left-hand panel labeled Old Value
  3. Enter the lowest number in the first line and the highest number in the line below (Figure 8)
  4. Click on the bubble next to the word Value in the top right panel labeled New Value (Figure 8) and type the number 1 on the line
  5. Click Add
  6. 3 thru 4 --> 1 will appear in the white pane labeled Old --> New (Figure 9)

Figure 8 Assigning New Values in Recode into Different Variables - Old and New Values Pop-Up Window

  1. To make sure that the unchanged values are included in the new variable select All other values in the panel on the left (Old Value) and Copy old value(s) in the top right panel (New Value)
  2. Click Add; ELSE à Copy will appear in the Old à New panel (Figure 9)
  3. Click Continue; the window will disappear and you will be back on Recode into Different Variables window (Figure 6)
  4. Click OK

Figure 9 Copy All Unchanged Values

SPSS Output

Once you hit OK it will seem as though nothing happened. If you look at the output window the only thing that will change there is that there will be a few lines of code that were not there before (Figure 10). This code is very important. It tells us what is the name of the original and new variable and the changes we made.

Figure 10 SPSS Output After Executing Recode into Different Variables Command

Let’s break down how to read the output:

  1. RECODE – this is the operation SPSS is performing
  2. marital – the name of the original variable
  3. (3 thru 4=1) – original values 3 through 4 are recoded to have value 1
  4. (ELSE=Copy) – all other values of the recoded variable are the same as in the original variable and are also included in the new variable
  5. INTO – another operation SPSS is performing; this means that another variable will be created at the end of the process
  6. maritalR – the name of the new, recoded, variable
    1. You can read items 1 – 6 as a sentence: “Recode variable marital such that values 3 through 4 equal 1 and everything else stays the same into a variable maritalR.”
  7. VARIABLE LABELES – SPSS operation telling SPSS to label a variable
  8. maritalR – the variable that will be labeled
  9. ‘recoded marital status’ – label of the variable
  10. EXECUTE – this tells SPSS to perform the above operations

Another very important thing that will happen once you execute the operation is that there will be a new variable in the Variable View in the Data window. The new variable will be listed all the way at the end, after all the other variables and its values will not be labeled.

Labeling values of a recoded variable

It is useful to label the variable values to always know what the numeric values mean. This is especially true when you create a new variable because this variable will not be in the codebook so later there will be nowhere to look to find out what the values represent.

  1. Locate the new variable in the Variable View in the Data window
  2. Click into the cell in the Values columns on the same line where the variable is (line 6 for maritalR), see Figure 11

Figure 11 Recoded Variable Without Value Labels in Variable View

  1. Click on the ellipsis (…) next to the word None
  2. A window like the one in Figure 12 will pop up

Figure 12 Labeling Attributes of Recoded Variable

  1. Recall the coding scheme for the new variable (step 1c in Recoding values section)
    1. not married = 1, married = 2
  2. Type the number 1 in the blank field next to Value
  3. Type the word married in the field next to Label
  4. Click Add
  5. Type the number 2 in the blank field next to Value (Figure 12)
  6. Type the words no married in the field next to Label
  7. Click Add
  8. Click OK

Check Your Work

To check if recoding was done correctly we can run a frequency distribution of the original and recoded variable and compare them.

  1. Run a frequency distribution for the original variable (Figure 13)
  2. Count the number of respondents who are married (188)
  3. Count the number of people who are not married
    1. Add the frequencies of those who answered Single, Divorced and Widowed (83)
  4. Look at the total number of respondents
  5. Run a frequency distribution for the recoded variable (Figure 14)
  6. Compare the number of respondents who are married and not married, and the total number of respondents to the original variable
  7. If the numbers are the same the recoding was likely done right

Figure 13 Frequency Distribution of Respondents’ Marital Status

Figure 14 Frequency Distribution of Recoded Respondents’ Marital Status