URBS/PLSI 492 "Research Methods"
Richard LeGates Spring, 1999
Office HSS 137 Office Hours M 1 – 3
Phone (415) 338-2875 and by appointment
FAX (415) 338-2391
e-mail dlegates@sfsu.edu

Project # 2: Data Collection, Entry, and Analysis

    The purpose of this project is to help students understand and apply material from the March 1 and 15 and April 5 class meetings concerning data collection, entry, and analysis. All students should purchase a ZIP disk and bring it to class on March 1 as you will need it for this project. Students should work on project 2 as we proceed through this material. We will discuss several intermediate products which students should bring to class. The final project is due April 12.

    For March 1 review the questions that you designed for Project 1 and modify them as necessary so that there are questions that require numeric data (like "How often do you drive on Highway 101 each week"). All of the attitude and opinion data should have a coding scheme to measure the responses. For each question prepare a matrix which arrays hypothetical responses from ten respondents. Each respondent should be one row and each question as one column. For example if respondents # 15 and # 16 answered questions as indicated in bold. The rows are often referred to as cases or records:

1. Respondent ID 15 1. Respondent ID 16
2. Where do you live?

1--San Francisco

2--Oakland

3--Berkeley

4--Somewhere else

2. Where do you live?

1--San Francisco

2--Oakland

3--Berkeley

4--Somewhere else

3. What is your political party affiliation?

1--Republican

2--Democrat

3--Other

3. What is your political party affiliation?

1--Republican

2--Democrat

3--Other

4. Which of the following best represents your        opinion about increased bicycle use.

1--Increased bicycle use should be strongly
     encouraged

2--Increased bicycle use should be somewhat  
    encouraged

3--Increased bicycle use should be neither 
    encouraged or discouraged

4--Increased bicycle use should be somewhat
    discouraged

5--Increased bicycle use should be strongly
    discouraged

4. Which of the following best represents your    
    pinion about increased bicycle use.

1--Increased bicycle use should be strongly
    encouraged

2--Increased bicycle use should be somewhat  
    encouraged

3--Increased bicycle use should be neither
    encouraged or discouraged

4--Increased bicycle use should be somewhat
    discouraged

5--Increased bicycle use should be strongly
    discouraged

5. What is your age 2 7 5. What is your age 6 3

 

The matrix would look like this:

      ID       City           Party      Bicycle Use     Age

15

2

2

1

27

16

1

1

5

63

    Complete a matrix for ten questions and at least 15 respondents. You can collect real data by interviewing students in the class, spouses, friends, or others or you can make up plausible data. We will discuss coding numeric data and how data is entered into computers for analysis on March 1. If your data needs to be corrected, improved, or augmented based on discussion March 1 you should make the necessary changes by March 8.

    Bring your revised data to class on March 8. Each question is one variable. Assign a name to each variable. These are called variable names. Your variable names should consist just of letters and not have spaces. They should not be longer than eight characters and they should be as descriptive as possible. Avoid obscure and fancy naming. For example variable names for the above survey might be:

ID
CITY
PARTY
BIKEUSE
AGE

    On March 8 you should also label each of the numeric codes attached to your nominal and ordinal level variables where a name would be useful. (It is not needed for the variable ID and generally not needed for numeric data like AGE). These are called value labels. For example the value labels associated with the above variables might be:

ID [None needed]

CITY

                San Francisco             1
                Oakland                       2
                Berkeley                       3
                Other city                     4

PARTY

                Democrat                    1
                Republican           2
                Other                   3

BIKEUSE

    Strongly encourage 1
    Encourage somewhat    2
    Neutral                         3
    Discourage somewhat 4
    Strongly discourage 5

AGE [None Needed]

    During the class sessions on March 1 and 15 we will learn how to create, name, and save computer data files onto the ZIP disk and open them for subsequent use. We will also learn how to enter and clean data and assign names to the variables and value labels to the values of the data in the data file. We see how data entry errors can be corrected, additional data added, and unwanted data removed. We learn to find individual cases and print out file information on all the variables.

    During the March 15 and April 5 class sessions we will learn to do quantitative data analysis is done using a computerized statistical package. We will be using the Statistical Package for the Social Sciences (SPSS), which is the statistical package most commonly used for social science and public policy data analysis. The emphasis in these sessions is in understanding how statistical packages work and learning to work with them. We will learn how to do some conceptually easy, but very powerful and very useful analytic procedures. URBS 493 "Data Analysis" will build on this foundation.

    We will learn how to compute new variables and recode data. We will also learn how to summarize nominal level data using SPSS’s FREQUENCIES command and how to compute the minimum, maximum, and mean for a ratio level variable using SPSS’s DESCRIPTIVES command. We see how social scientists do bivariate analysis using SPSS’s CROSSTABS command. We will also learn how generate visual representations of quantitative data using SPSS’s PIE and BAR chart commands..

    Project 2 is due on April 12. It should consist of the following things: