Cross tabulation is a useful way of exploring the relationship between variables that contain only a few categories. For example, in the last chapter we looked at the relationship between how people voted in the 1992 presidential election and how they voted four years later. Each variable consisted of three categories -- Clinton, Perot, and Bush (in 1992) or Dole (in 1996). But what if we wanted to find out if the average age at birth of first child is younger for women than for men? Here our dependent variable is a continuous variable consisting of many values. We could recode it so that it only had a few categories (e.g., under 20, 20 to 24, 25 to 29, 30 to 34, 35 to 39, 40 and older), but that would result in the loss of a lot of information. A better way to do this would be to compare the mean age at birth of first child for men and women.We're going to use the subset from the 1998 General Social Survey to answer this question. Open GSS98A.SAV. (Follow the instructions in Chapter 1 if you have forgotten.) Now click on "Analyze," point your mouse at "Compare Means," and then click on "Means." We want to put age at birth of first child (AGEKDBRN) in the Dependent List and sex in the Independent List. Highlight "AGEKDBRN" in the list of variables on the left of your screen, then click on the arrow next to the Dependent List box. Now click on the list of variables on the left and use the scroll bar to find the variable SEX. Click on it to highlight it and then click on the arrow next to the Independent List box. Your screen should look like Figure 6-1.
![]()
Figure 6-1Click on "OK" and the Output Window should look like Figure 6-2. On the average, women are a little more than two years younger than men at the birth of first child.
![]()
Figure 6-2Independent-Samples T Test
If women are on the average, two years younger than men at birth of first child, can we conclude that this is also true in our population? Can we make an inference about the population (all people) from our sample (about 2,800 people selected from the population)? To answer this question we need to do a t test. This will test the hypothesis that men and women in the population do not differ in terms of their mean age at birth of first child. By the way, this is called a null hypothesis. The particular version of the t test that we will be using is called the independent-samples t test since our two samples are completely independent of each other. In other words, the selection of cases in one of the samples does not influence the selection of cases in the other sample. We'll look later at a situation where this is not true.We want to compare our sample of men with our sample of women and then use this information to make an inference about the population. Click on "Analyze," then point your mouse at "Compare Means" and then click on "Independent-Samples T Test." Find "AGEKDBRN" in the list of variables on the left and click on it to highlight it, then click on the arrow to the left of the Test Variable box. This is the variable we want to test so it will go in the Test Variable box. Now click on the list of variables on the left and use the scroll bar to find the variable SEX. Click on it to highlight it and then click on the arrow to the left of the Grouping Variable box. SEX defines the two groups we want to compare so it will go in the Grouping Variable box. Your screen should look like Figure 6-3.
![]()
Figure 6-3Now we want to define the groups, so click on the "Define Groups" button. This will open the Define Groups box. Since males are coded 1 and females 2, type 1 in the Group 1 box and 2 in the Group 2 box. (You will have to click in each box before typing the value.) This tells SPSS what the two groups are we want to compare. (If you don't know how males and females are coded, click on "Options," then on "Variables" and scroll down until you find the variable "SEX" and click on it. The box to the right will tell you the values for males and females. Be sure to close this box.) Now click on "Continue" and on "OK" in the Independent-Samples T Test box. Your screen should look like Figure 6-4.
![]()
Figure 6-4This table shows you the mean age at birth of first child for men (25.21) and women (22.52) which is a mean difference of 2.69. It also shows you the results of two t tests. Remember that this tests the null hypothesis that men and women have the same mean age at birth of first child in the population. There are two versions of this test. One assumes that the populations of men and women have equal variances (for AGEKDBRN), while the other doesn't make any assumption about the variances of the populations. The table also gives you the values for the degrees of freedom and the observed significance level. The significance value is .000 for both versions of the t test. Actually, just as in Chapter 5, this means less than .0005 since SPSS rounds to the nearest third decimal place. This significance value is the probability that the t value would be this big or bigger simply by chance if the null hypothesis were true. Since this probability is so small (less than five in 10,000), we will reject the null hypothesis and conclude that there probably is a difference between men and women in terms of average age at birth of first child in the population. Notice that this is a two-tailed significance value. If you wanted the one-tailed significance value, just divide the two-tailed value in half.
Let's work another example. This time we will compare males and females in terms of average years of school completed (EDUC). Click on "Analyze," point your mouse at "Compare Means," and click on "Independent-Samples T Test." Click on "Reset" to get rid of the information you entered previously. Move EDUC into the Test Variable box and SEX into the Grouping Variable box. Click on "Define Groups" and define males and females as you did before. Click on "Continue" and then on "OK" to get the output window. Your screen should look like Figures 6-5. There isn't much of a difference between men and women in terms of years of school completed. This time we do not reject the null hypothesis that men and women have the same mean education.
![]()
Figure 6-5Paired-Samples T Test
We said we would look at an example where the samples are not independent. (SPSS calls these paired samples. Sometimes they are called matched samples.) Let's say we wanted to compare the educational level of the respondent's father and mother. PAEDUC is the years of school completed by the father and MAEDUC is years of school for the mother. Clearly our samples of fathers and mothers are not independent of each other. If the respondent's father is in one sample, then his or her mother will be in the other sample. One sample determines the other sample. Another example of paired samples is before and after measurements. We might have a person's weight before they started to exercise and their weight after exercising for two months. Since both measures are for the same person we clearly do not have independent samples. This requires a different type of t test for paired samples.Click on "Analyze," then point your mouse at "Compare Means," and then click on "Paired-Samples T Test." Scroll down to "MAEDUC" in the list of variables on the left and click on it to move it to the Current Selections box as Variable 1. Now click on "PAEDUC" to move it to the Current Selections box as Variable 2. Click on the arrow to the left of the Paired Variables box to move this pair of variables into the box in the middle of the window. Your screen should look like Figure 6-6.
![]()
Figure 6-6Click on "OK" and your screen should look like Figure 6-7.
![]()
Figure 6-7This table shows the mean years of school completed by mothers (11.54) and by fathers (11.44), as well as the standard deviations. The t-value for the paired-samples t test is 1.342 and the 2-tailed significance value is 0.180. (You will have to scroll down to see these values.) This is the probability of getting a t-value this large or larger just by chance if the null hypothesis is true. Since this probability is quite large we won't reject the null hypothesis. There is no statistical basis for saying that the respondent's fathers and mothers have different educational levels.
One-Way Analysis of Variance
In this chapter we have compared two groups (males and females). What if we wanted to compare more than two groups? For example, we might want to see if age at birth of first child (AGEKDBRN) varies by educational level. This time let's use the respondent's highest degree (DEGREE) as our measure of education. To do this we will use One-Way Analysis of Variance (often abbreviated as ANOVA). Click on "Analyze," then point your mouse at "Compare Means," and then click on "Means." Click on "Reset" to get rid of what is already in the box. Click on "AGEKDBRN" to highlight it and then move it to the Dependent List box by clicking on the arrow to the left of the box. Then scroll down the list of variables on the left and find "DEGREE." Click on it to highlight it and move it to the Independent List box by clicking on the arrow to the left of this box. Your screen should look like Figure 6-8.
![]()
Figure 6-8Click on the "Options" button and this will open the Means: Options box. Click on the box labeled "ANOVA table and eta." This should put an X in this box indicating that you want SPSS to do a One-Way Analysis of Variance. Your screen should look like Figure 6-9.
![]()
Figure 6-9Click on "Continue" and then on "OK" in the Means box and your screen should look like Figure 6-10.
![]()
Figure 6-10In this example, the independent variable has five categories: less than high school, high school, junior college, bachelor, and graduate. Figure 6-10 shows the mean age at birth of first child for each of these groups and their standard deviations, as well as the Analysis of Variance table including the sum of squares, degrees of freedom, mean squares, the F-value and the observed significance value. (You will have to scroll down to see the Analysis of Variance table.) The significance value for this example is the probability of getting a F-value of 81.009 or higher if the null hypothesis is true. Here the null hypothesis is that the mean age at birth of first child is the same for all five population groups. In other words, it states that the mean age at birth of first child for all people with less than a high school degree is equal to the mean age for all with a high school degree and all those with a junior college degree and all those with a bachelor's degree and all those with a graduate degree. Since this probability is so low (<.0005 or less than 5 out of 10,000), we would reject the null hypothesis and conclude that these population means are probably not all the same.
There is another procedure in SPSS that does One-Way Analysis of Variance and this is called One-Way Anova. This procedure allows you to use several multiple comparison procedures that can be used to determine which groups have means that are significantly different. If you want to use these procedures, consult the SPSS Base 9.0 for Windows User's Guide (SPSS, Inc., 1999).
Summary
This chapter has explored ways to compare the means of two or more groups and statistical tests to determine if these means differ significantly. These procedures would be useful if your dependent variable was continuous and your independent variable contained a few categories. The next chapter looks at ways to explore the relationship between pairs of variables that are both continuous.Chapter Six Exercises
Use the GSS98A data set on your data disk for all these exercises***link.
- Compute the mean age (AGE) of respondents who voted for Clinton, Dole, and Perot (PRES96). Which group had the youngest mean age and which had the oldest mean age?
- Use the independent-samples t test to compare the mean family income (INCOME98) of men and women (SEX). Which group had the highest mean income? Was the difference statistically significant (i.e., was the significance value less than .05)?
- Use the independent-samples t test to compare the mean age (AGE) of respondents who believe and do not believe in life after death (POSTLIFE). Which group had the highest mean age? Was the difference statistically significant (i.e., was the significance value less than .05)?
- Use One-Way Analysis of Variance to compare the mean years of school completed (EDUC) of respondents who voted for Clinton, Dole, and Perot (PRES96). Which group had the most education and which had the least education? Was the F-value statistically significant (i.e., was the significance value less than .05)?
|
|
|
|
|
|
|