Simple crosstabs (also called cross tabulation, or contingency tables), which examine the influence of one variable on another, should be only the first step in the analysis of social science data. It is fun to hypothesize that the more conservative a person's political orientation, the more likely they are to oppose abortion, run the crosstabs, and then conclude you were right. However, this one-step method of hypothesis testing is very limited. What if all the Republicans in your sample are religiously conservative and all the Democrats are atheists? Is it political party that best explains your findings, or is it religious orientation? Or what if the political conservatives as a group are much older than the liberals, would age then be the real causal factor? Or is it some combination among all of these variables that explains the varying opinions of your respondents? Or suppose you hypothesize that men and women differed significantly in their belief that the ability to think for one's self (GSS98A.SAV variable name = THNKSELF) was an important value to instill in children. The crosstabs for THNKSELF and SEX show that while a slight majority (51%) of all respondents reported that this was the most important value among those listed (to be popular, to obey, to help others, to work hard), only 45% of the men surveyed agreed with this compared with 57% of the women (see Figure 8-1).
![]()
Figure 8-1This percentage point difference (epsilon) of 12 is "interesting," even if you don't yet know whether it is statistically significant. Can you conclude that gender is the causal factor here? While it may indeed be true that gender is explanatory, you won't really know this until you have failed to account for this variation in any other way. To do this, run crosstabs of (i.e., "control for") other independent variables to see if something else might account for this variation among respondents.
Recall that your original crosstabs procedure produces one contingency table, with as many rows as there are categories (or values) of the dependent variable, and as many columns as there are categories of the independent variable. So in Figure 8-1, we have a 5 by 2 table. When you start using control (sometimes called test) variables, you will get as many separate tables as there are categories of the control variable. For instance, if you want to control for levels of education, and simply used EDUC as the control variable, you end up with 20 separate tables. This is NOT a good idea. Try doing this to see what we mean. Notice how difficult it is to compare across this many tables. So before you do any further analysis, recode your variables into the smallest number of categories that are still logically useful.
In this next example EDUC was recoded as EDUC2 into three categories (0-11 years, 12 years, more than 12 years). After you have done these recodes, let's see what happens when we do crosstabs again, this time controlling for education. To do the appropriate crosstabs, go to the Analyze, Descriptive Statistics, Crosstabs menu and double-click. Enter THNKSELF into the Row box and SEX into the Column box. (Recall that this is how you generate one contingency table.) Now you are ready for the next step, the addition of a control variable. Choose EDUC2 from your variables list and enter it into the empty box at the bottom of the Crosstabs screen. Figure 8-2 shows you what this will look like.
![]()
Figure 8-2The SPSS output for this procedure is shown in Figure 8-3.
![]()
Figure 8-3Note that there are now three tables, one for each value of EDUC2. If you want to produce more three-way tables, just move the variables from the variable list into that third box.
| If you want to produce 4-way or more tables, click on the Next box, just to the right of "Layer 1 of 1." The box that had previously shown EDUC2 would now be empty, and you could add in your fourth variable (perhaps RACE, recoded as White-Nonwhite). Your first table would show THNKSELF by SEX for whites with 0-11 years of education, then for 12 years, then 12+ years, then non-whites with 0-11 years, etc., for a total of six tables. |
Figure 8-1 shows the original, or zero-order contingency table of the relationship between THNKSELF (unrecoded) and SEX.Figure 8-3 shows the three partial tables that resulted from THNKSELF crosstabbed by SEX, controlling for EDUCR.
First, note that there is a big difference among respondents at each of the three educational levels. Only a third (34%) of the respondents with less than a high school education thought that thinking for oneself was the most important value to instill in children. Compare this with the three out of five (61%) with 13 or more years of education who did think this was most important. Also note that as education increases, women are more likely than men to say that thinking for oneself is the most important value. It appears here that educational level seems to explain more than does gender. Try other variables as a control to see what happens. As a general rule, here is how to interpret what you find from this elaboration analysis:
Try some of your own three-way (or higher) tables using some of the data sets we have provided you with. Recall that for this procedure, there should be few categories for each variable, particularly your control variables (so you might need to recode), and you are limited to variables measured at, or recoded to, nominal or ordinal levels.
- If the partial tables are similar to the zero-order table, you have replicated your original findings, which means that in spite of the introduction of a particular control variable, the original relationship persists. The only way to convince us that this is indeed a strong, or even causal, relationship is if you control for all the other logical independent variables you can think of, and still find essentially no differences between the zero-order tables and their partials.
- If all the partials are significantly less than those found in the original AND IF your control variable is antecedent (occurs prior in time) to both the other variables, you have found a spurious relationship and explained away the original. In other words, the original relationship was due to the influence of that other variable, not the one you hypothesized.
- If the partials are less AND IF your control variable is intervening, you have interpreted the relationship. If the time sequence between the independent and control variable is not determinable (or otherwise unclear), you don't know whether you have explanation or interpretation, but you do know that the control variable is important.
- If one or more partials is stronger than the original relationship and one or more is weaker, you have discovered the conditions under which the original relationship is strongest. This is referred to as specification, or the interaction effect.
- If the zero order table showed weak association between the variables, you might still find strong associations in the partials (which is a good argument for keeping on with your initial analysis of the data even if you didn't "find" anything with bivariate analysis). The addition of your control variable showed it to have been acting as a suppressor in the original table.
- Last, if a zero order table shows only a weak or moderate association, the partials might show the opposite relationship, due to the presence of a distorter variable.
Multiple Regression
Once you have discovered that several of your independent variables are related to your dependent variable, you might want to try multiple regression (multiple linear regression analysis). The three-or-more-way crosstabs shown previously are more an exploratory technique, whereas multiple regression is more explanatory. With multiple regression you can generate beta values (partial regression coefficients) which give you an idea of the relative impact of each independent variable on the dependent.You also will generate the R-squared value, which is a summary statistic of the impacts of all the independent variables taken together. Remember the important assumptions for using regression: a linear relationship between each independent variable and the dependent; a normal distribution of your variables, and variables measured at interval or ratio levels.
Go to the Analyze, Regression, Linear menu. For your dependent variable, choose THNKSELF from the variable list. For the independent variables choose EDUC (unrecoded), ATTEND, BIBLE, and SEX (see Figure 8-4).
![]()
Figure 8-4Note that EDUC doesn't show up in the list of independent variables, but you could use the scroll bar to find it. Now choose the "Statistics" button at the bottom of the dialog box and a new dialog box will appear, shown here in Figure 8-5 with the default options.
![]()
Figure 8-5Click on the "Continue" button to return to Figure 8-5, then click on the "Plots" button. Your screen should now look like Figure 8-6.
![]()
Figure 8-6Click on "Continue" and look at your next choice, which is "Save." A dialog box like Figure 8-7 appears.
![]()
Figure 8-7Click on "Continue" and then "Options" and your screen should look like Figure 8-8, which shows the default options (then click "Continue" to return to the Linear Regression dialog box).
![]()
Figure 8-8Your last task is to choose your method of analysis. In Figure 8-4 you will see the "Method:" button right under the "Independent(s): " box. You have several choices here, and you can use the scroll button to see what they are. "Stepwise" is the one we chose for this example, and the one that you will probably use most often. (See Figure 8-9.) For an in-depth discussion of all the possible choices for Multiple Regression, you will need to consult the SPSS manuals.
![]()
Figure 8-9Figure 8-10 shows you the first screen of the results in the Output window when you finally click "OK" in the Linear Regression dialog box after having chosen stepwise regression using all the default options.
![]()
Figure 8-10
|
|
|
|
|
|