Social Sciences Research and Instructional Council
Teaching Resources Depository

SPSS for Windows, Version 9.0: A Brief Tutorial
(Hypertext Version)

Chapter Seven: Regression and Correlation

© The Authors, 2000; Last modified 12 May 2001
Regression and correlation analysis (also called "least squares" analysis) helps us examine relationships among interval or ratio variables. In this chapter, we'll explore techniques for doing bivariate regression and correlation. Chapter 8 will include a look at multiple regression and correlation.

To illustrate these techniques, we'll use a data set on your diskette called, "MACRO.SAV" that contains data about the U.S. economy for the years 1929 through 1998. This file was derived from data that were provided to the authors by Professor James Gerber of San Diego State University, and that are revised and updated from his "Exploring the Macroeconomy," [15 August, 1998], available on the Internet (The codebook is in Appendix C.).

Open the "MACRO.SAV" file following the instructions in Chapter 1 under "Getting a Data File."

We'll begin by looking at annual percentage changes in average hourly earnings (AHE_D). Data for this particular variable are included for the years 1948 through 1998. We'll test the hypotheses that workers' average hourly earnings will be influenced by change in:

  1. investment ("I_D"),
  2. the money supply ("M2_D"), and
  3. nonfarm labor productivity ("PROD1_D").
To test these hypotheses, click on "Analyze," "Correlate," and "Bivariate." The dialog box shown in Figure 7-1 will appear on your screen.

Figure 7-1
Figure 7-1

Move the following variables to the box under "Variables:" "AHE_D," "I_D," "M2_D" and "PROD1_D." Since our variables are measured at least at the interval level, we will accept the default of calculating Pearson correlation coefficients. If our data were ordinal, or badly skewed, we would compute the Kendall's tau-b or Spearman coefficients instead. Since we are predicting the direction of the relationships in questions (with all variables expected to be positively correlated), click on "One-tailed" under "Test of significance." (Note: since we have population data rather than a sample, many researchers would not bother with measures of significance.) Now click on "Options." This will open up the dialog box shown in Figure 7-2.

Figure 7-2
Figure 7-2

For our purposes, we won't need to change any of the defaults, so click on "Continue,@ then on "OK." Figure 7-3 shows the resulting correlation matrix.

Figure 7-3
Figure 7-3

Our hypotheses are all supported. The weakest relationship is with investments, but even this relationship is moderately strong (.337) and statistically significant at the .01 level (that is, it would occur by change less than one time in a hundred). The other two relationships tested are stronger; change in worker earnings has a .782 correlation with change in the money supply and .578 with growth in productivity. Note that the relationship with money supply is based on only 39 cases. This is because our data set does not include this variable prior to 1959.

Let's look more closely at the relationship between earnings and the money supply. Click on "Graphs," "Scatter " and "Define." This will open up the dialog box shown in Figure 7-4.

Figure 7-4
Figure 7-4

In the box on the left, click on "AHE_D," then on the arrow key that is pointing toward the box labeled "Y Axis:." Now scroll down and click on "M2_D," then on the arrow key that is pointing toward the box labeled "X Axis:." Finally (for a reason that will become clear a bit later), click on "YEAR," then on the arrow pointing toward the box labeled "Label Cases by:," then on "OK." This produces the result shown in Figure 7-5.

Figure 7-5
Figure 7-5

OK, but not very fancy (or easy to interpret). Place your mouse arrow somewhere on the graph, and click on your right mouse button. Now click on "SPSS Chart Object," and on "Open." Figure 7-6 shows the SPSS Chart Editor that now appears.

Figure 7-6
Figure 7-6

You might need to maximize it to get a better view. Go to the menu bar, click on "Chart," then on "Options," producing the dialog box shown in Figure 7-7.

Figure 7-7
Figure 7-7

Under ""Fit Line" in the center of the screen, click on the box next to "Total." This will cause the regression (least squares) line to be added to your graph. Now click on "Fit Options" near the middle of the screen. This opens up the dialog box shown in Figure 7-8.

Figure 7-8
Figure 7-8

In the lower right-hand portion of the screen, click on the box next to "Display R-square in legend." Now click on "Continue," then on "OK." The chart (see Figure 7-9) now shows the regression line and the R2 coefficient (.6122).

Figure 7-9
Figure 7-9

Notice that most of the points fall fairly close to the line, but that a few do not. The vertical distance between the point and the line is called the residual, the difference between the actual value of the dependent variable and the value "predicted" by the least squares equation represented by the line. To identify some of the "outliers" (cases with high residuals), go to the menu bar, click on "Chart," then on "Options." In the resulting dialog box (the same as Figure 7-7 except for the change you made under "Fit Options," click on "Off" next to "Case Labels:," "On," and "OK." The chart (Figure 7-10) now shows the year for each point on the scatterplot.

Figure 7-10
Figure 7-10

Most of these run together, producing a meaningless jumble, but a few years stand out. In 1969, earnings grew more than we would have expected given the growth in the money supply that year. In 1975 and 1980 the opposite happened, with earnings lower than we would have predicted.

We can get more information about the regression line. Minimize the SPSS Chart Editor. Click on "Analyze," "Regression," and "Linear." This opens up the dialog box shown as Figure 7-11.

Figure 7-11
Figure 7-11

Move "AHE_D" to the "Dependent" box, and "M2_D" to the "Independent(s)@ box. Click on "OK." In the left window, click on "Coefficients." The results should look like Figure 7-12.

Figure 7-12
Figure 7-12

From this information, we can construct the equation for the regression line as follows:

Y' = -.609 + .361X
where:
Y' = predicted annual percentage change in average hourly,
-.609 = the constant (i.e., the Y intercept) shown in Figure 7-11,
.361 = the unstandardized regression coefficient, and
X = annual percentage change in nonfarm labor productivity.

Summary

In the last three chapters we have used various SPSS for Windows procedures to examine relationships between two variables. In Chapter eight, we will look at some vary powerful procedures for analyzing multivariate relationships.

Chapter Seven Exercises

  1. The "Philips Curve" predicts that there is a trade-off between unemployment and inflation. That is, the lower the rate of unemployment, the more inflation there will be. Calculate the coefficient between UR and CPI_D and look at the scatterplot between these two variables. Repeat, but this time selecting for analysis only years 1948-1969. (See Chapter 3 for instructions.) Do the same for the years 1970-1998. You should find a substantially different pattern. (See Prof. Gerber's explanation on the Internet by going to macrhome.html and clicking on section 6.3.)
  2. What other annual change measures (see Appendix C) best explain annual change in disposable personal income?
  3. (This exercise might make you rich!) What other annual change measures best explain annual change in the stock market?
  4. Most of the variables in the General Social Survey subset (GSS98A.SAV) are nominal or ordinal, and hence not appropriate for regression analysis (though you can do correlational analysis with ordinal data by choosing the Kendall's tau-b or Spearman correlation coefficient instead of the Pearson coefficient). There are some variables that are measured at least at the interval level. In Chapter 1, we briefly examined the relationships among years of education (EDUC), father=s years of education (PAEDUC), and mother=s years of education (MAEDUC). Try also looking at the relationship between years of education (EDUC) and income (INCOME98). Note, because the General Social Survey includes over 2,800 respondents, the scatterplots will be too dense to be very meaningful. You can still examine the regression (best fit) lines, but don't try to label individual cases.
Back
Top
Previous Chapter
SPSS Book Table of Contents
Next Chapter
Home