Starting the Cases with an ID Number
So, let's say you were recently hired at a big salary because the boss thought that you could conduct employee surveys, among other things. And let's presume that your boss was correct, and that you have created and administered a questionnaire to a random sample of employees. Now, you are looking at a large box of completed questionnaires. How do you get SPSS to help you with the analysis? Where do you start?That is what this chapter intends to accomplish. After finishing this chapter, you should be able to create a SPSS data file. It will have (1) the data and (2) some labeling so you can have a feel for what the question was about without always referring to the questionnaire. You may have also indicated that some answers, such as "Don't know," should be excluded from most analysis.
To help illustrate this process we will use a shortened version of a questionnaire made up of questions from the General Social Survey (GSS) conducted by the National Opinion Research Center (NORC). For this example, the students in a social research class wanted to see if their opinions on social issues were similar to the national sample polled in the original survey. (For more detail on the General Social Survey, see Davis and Smith 1996.)
The students knew they were not a representative sample, even of college students, but it was an interesting way to learn how to create a new data file. They decided to use the following questions: (The questionnaire and codebook are in Appendix B.)
What is your age?Do you think it should be possible for a pregnant woman to obtain a legal abortion:Are you male or female?
What is your religious preference?
Generally speaking, in politics do you consider yourself as conservative, liberal, middle of the road?
What kind of marriage do you think is the more satisfying way of life: one where the husband provides for the family and the wife takes care of the house and children or one where both the husband and wife have jobs and both take care of the house and children?
If there is a strong chance of a serious defect in the baby?If she is married and does not want any more children?
If the woman's own health is seriously endangered by pregnancy?
If the family has a very low income and cannot afford any more children?
If she became pregnant as a result of rape?
If she is not married and does not want to marry the man?
If the woman wants it for any reason?
Basic Steps in Creating a Data File
There are a few things that always need to be done to create a data file:
- The file itself needs a name. The rules for this are the same as for any Windows 95/98 program, but the extension (the file name after the period) is always .SAV for an SPSS for Windows data file.
| There are other extensions such as .POR, portable files. These are SPSS files designed to run on Unix, Mac, etc. systems. You would start SPSS first, then open the .por data file from within SPSS |
There are other, optional things that can be done with the SPSS data file:
- Each question or item needs a name, called a "variable name." The variable name is no longer the 8 characters and usually is created in a way that gives the user a hint of what the item is about. For example, AGE would be a good variable name for the variable dealing with the respondent's age.
- Then obviously the data needs to be entered. If the data deal with the responses to a question in a questionnaire, the data will be codes that represent the answers. So, for example, in a variable with the variable name of "SEX," a 1 might be entered if the respondent is female, or a 2 might be entered if the respondent is male.
There are many other ways we can customize our data file, but these are the most common ones:
- Create an extended label for each variable. While the variable name may give the user a hint of what the item is about, a Variable Label will elaborate on this, thus helping the user even more. So for example, a variable name might be ABANY, and the variable label associated with this variable name might be "Should abortions be allowed for any reason?"
- Create a label for each value or response. SPSS works best with numeric codes representing the answers, such as 1 for "female," etc., and these codes become difficult to interpret without some help. The help comes in the form of a label for the value. We could have SPSS print out the word "female" whenever that category is listed in our results.
- With a label, the value 1 for female would be accompanied by the word "female."
- SPSS is capable of looking at a variety of types of information. For example, it can look at whole numbers, numbers with decimal points, numbers given in a currency format, etc. We can tell SPSS which of these to use, how large the number is and, if a decimal point is used, how many digits there are on the right side of the decimal.
- We can also tell SPSS to exclude certain values. For example, if we have a set of responses that go from "Strongly Approve, Approve, Uncertain, Disapprove, Strongly Disapprove" and we do not want "Uncertain" to be included in the analysis, we can tell SPSS to exclude that response.
Starting the Cases with an ID Number
The first thing we do in the creation of a data file is to give each case, e.g. each questionnaire, an identification number. This is not so individuals can be identified, but so we can keep track of each case when we check the accuracy of our data entering. If later we do a frequency distribution to say how many respondents answered a question, and we find respondents in a category that is not legitimate, we need to be able to find out which questionnaire was entered incorrectly so we can correct the error. In other words, if we have used a 1 to refer to females, and a 2 to refer to males, when we get a respondent with a code of 5 we know this is incorrect. We can tell SPSS to give us the ID number for that person, go get the questionnaire, and make the correction.Next, we need a variable name for each question. The variable name should be simple, but we want it to express the main idea of the variable in some way. The variable names must be eight characters or less starting with a letter. They can be numbers and/or letters but not spaces and only a few special characters, so it is best not to include any odd symbols. AGE and SEX are easy variable names for the first two questions in our sample questionnaire. (You might want to look at Table 2.1 while reading the rest of this paragraph.) Next we used RELIG to refer to religion, CL to refer to political orientation (conservative or liberal), and MARRIG to refer to their attitude about marriage. Finally, we used AB followed by a unique set of up to 6 letters to refer to each of the abortion questions, so ABDEFECT refers to abortion if there is a defect in the baby, ABSINGLE refers to an abortion if the woman is not married, and so forth.
ABDEFECT tells us a little about the question that was asked, but if you had a large number of variables, or you wanted to present your analysis to someone else, say your boss, it would be nice to have something a little more complete. SPSS allows you to add variable labels. The labels can be up to 120 characters long, but few of the SPSS procedures will allow that much space so they become cut off, or truncated, if made that long. For ABDEFECT we could add a variable label of "Allow abortion if strong chance of serious defect?"
We can do the same with the values codes that correspond to the answers. To follow through on our earlier example of the variable SEX where we used the codes 1 and 2 for females and males, without labels the results of our SPSS procedures would only have 1s and 2s. That isn't bad for something simple like SEX, but how about something more complicated like the religious denomination?
After we have given each variable a name, we give each possible response a code called a value label that is often the number corresponding to the order of the answers. (Although it could use letters, using numbers only will avoid some possible problems in statistical analysis.) For example, SEX could be coded 1 for male and 2 for female; political orientation could be 1 for conservative, 2 for liberal, and 3 for middle of the road. These also could be given extended value labels such as Male, Female, Conservative, Liberal, Middle of the Road.
| We could have also used letters instead of numbers, so that M would stand for male and F for female. This approach, however, is not recommended. |
Sometimes respondents don't answer a question, give more than one answer, or do something else that makes their answers unusable. For example, respondent 02 marked both yes and no on the last question, respondent 03 wrote none on question 4 on political orientation, and respondent 13 did not answer the marriage question. We often use 9 to code this "missing data" or 99 if it is a two-digit value. Note that this would cause problems if 9 or 99 were real codes, for example, if 9 was an actual response to a question or if age at last birthday included some ninety-nine-year-olds.Generally if a questionnaire is set up correctly (e.g., has the responses precoded) the person entering the data can do so right from the questionnaire. Sometimes, however, it is a good idea to plan all this and to put the data in a matrix like Table 2.1 before entering them into the computer.
To start creating the new data file, go into SPSS as you were taught in Chapter 1. The first thing you see is the Data Editor. It is set up like a spreadsheet (much like Table 2.1), with the upper-left cell outlined. See Figure 2-1.
![]()
Figure 2-1The rows are for the cases, e.g., the respondents or the questionnaires, and the columns are for the variables, e.g., the questions. The upper-left cell will usually contain the identification number for the first case and the cells across that row will contain data about that case. To replace the default data definitions with your own or to edit later, double click on the "var" on top of the column to get the Define Variables dialog box. See Figure 2-2.
![]()
Figure 2-2The first variable will be the identification number for the first case. Type in a variable name of eight characters or less, e.g., id. (Variable names are not case sensitive so ID and Id and id are the same.
| Actually, SPSS has reserved a few words that cannot be used, e.g., ALL and AND--see SPSS Inc., 1999. |
Click on "Labels" to add descriptive labels. See Figure 2-3. These can be up to 120 characters but are usually less. They are case sensitive, so type them exactly as you want them. Use brief, but descriptive, phrases that will be easy to recognize later.
![]()
Figure 2-3After naming and labeling the variable, give each possible response a value name and label the values in a way that would be useful. For example, using the variable SEX again, 1 would be for male, so type 1 and then click "Add," type 2 for female and then click " Add." (When you want to modify value labels, click " Change," and click "Remove" if you want to delete one.)
For a variable with values you do not want included in the analysis, open the "Define Missing Values" dialog box by clicking on "Missing Values" in the Define Variables dialog box. See Figure 2-4.
![]()
Figure 2-4The default is no missing values. You can enter up to three missing values for a variable, so type in 9 and click on "Continue" to go back to the Define Variables dialog box. Since we want to use whole numbers for our data, click on "Type" and change "Decimal Places" to 0 and click on "Continue" .
Once the variable names and labels and the value names and labels are set up, you can enter the data into the matrix on the screen. Give your new data a file name and save it before you exit or go on to using the file. On the first screen, you can save by clicking on "File" and using "Save As" to enter your file name. After you have named the file and saved the data the first time, you can save changes with "File" and " Save" or by clicking the little disk icon near the upper-left corner of the screen. It is important to save data frequently.
Check the accuracy of your data by skimming down each column for codes that are impossible with these value labels. For example, SEX can have only three possibilities since males are 1, females are 2, and missing information is 9. You could do this on the screen or on a frequency distribution from SPSS (see, Ch. 4, in this book). Next, check the accuracy of the coding by having one person read the codes while another checks the entries in SPSS. These instructions are very simple. If you want more detail see SPSS Inc. (1999).
![]()
Table 2-1Templates: Using the Same Value Labels Over and Over
What if you have twenty variables that would use the same value labels? Using templates you can enter the value labels once and then use the value labels for as many variables as you wish. The template function allows you to create a "master" variable and copy the characteristics of the master variable to other new or old variables. First, move the mouse to the menu bar at the top of the screen and click on "Data." Next, click on the choice "Templates." You should get something that looks like Figure 2-5.
![]()
Figure 2-5Now click on the "Define" button, then highlight the "Name" section under "Template Description." It should look something like Figure 2-6.
![]()
Figure 2-6We are going to make the template using a modified Likert scale so we will call this template "AGREE," which I will type into the name section.
Now we want to define the characteristics of the template we are calling AGREE. We are really only going to take care of the "Type" and the "Value Labels" because the other two items we could define are OK the way they are. So click on "Type " and then make sure "numeric" is selected, then click on "Continue" (see Figure 2-7).
![]()
Figure 2-7Next, click on "Value Labels" and you should see a screen that looks like Figure 2-8, which is just like what you saw earlier in the chapter when we showed you how to label values.
![]()
Figure 2-8Here we will type in the "1" for the "Value," and then tab down to Value Labels and type "Strongly Agree." Then click on "Add." (Be sure to do this. It's an easy step to forget. Pressing the enter key gets you something else. If you press the enter key by mistake, press cancel.) Figure 2-9 shows you a screen in the process of having the labels typed.
![]()
Figure 2-9Now, do this again, only the value you should type in is "2" and the label is "Agree." Again, be sure to press the "Add" button. Continue to associate the value numbers with the value labels until you are finished. When you are finished, click on "Continue." The next step is to associate our template "Agree" with the variables. In our case, we have checked all four items under the heading "Apply" (type, value labels, missing values, column format) even though we have only worked with the first two (the default for the others is OK). Now click on "Add" and then on "OK."
Now we need to apply this to our variables. First, highlight the columns where you want the new variables. See Figure 2-10 for an example of what your screen should look like now.
![]()
Figure 2-10Next, click on "Data" on the menu bar, then "Template." Make sure the template name is "AGREE," then click on "OK." This will have transferred the information we saved in the Agree Template into those variables and the variables will also have been given the default variable names of VAR0001, VAR0002, etc. You can now click on the columns and rename the variables as you were shown earlier.
One last thing: this works for variables that have not been named as well as those that have already been named.
Chapter Two Exercises
At California State University, Fresno, the Friendly Visitor Service hires college students to do in-home care for elderly people so they can remain independent and stay in their homes as long as possible. The students do cleaning, yard work, shopping, etc. The staff begins by interviewing clients in their homes and assessing their need for services. The following information about clients is used to match the seniors with the students who want employment:To keep track of the needs of potential clients, the program could create a data set for SPSS beginning with information from the applications in early fall, 1995 from Table 2.2 (next page). For this example, we just used the number of activities for which the seniors need help, but we could have included the yes/no responses for each of the activities of daily living. (Code the values numerically using 1 for male and 2 for female for SEX, etc.)
- Year of birth:
- Sex: Male or Female
- Lives alone: Yes or No
- Low income: Yes = less than $7,360 for single persons or less than $9,840 for married couples or No = more than those amounts
- Assistance needed for the activities of daily living (ADLS) --bathing, dressing, toileting, transferring in/out of bed, eating
- Total number of ADLS needing help
- Assistance needed for instrumental activities of daily living (IADLS) --using telephone, shopping, preparing food, light housework, heavy housework, finances
- Total Number IADLS needing help
![]()
Table 2-2
|
|
|
|
|
|
|