URBS/PLSI 492 "Research Methods"
Richard LeGates Spring, 1999
Office HSS 137 Office Hours M 1 3
Phone (415) 338-2875 and by appointment
FAX (415) 338-2391
e-mail dlegates@sfsu.edu
Project # 2: Data Collection, Entry, and Analysis
The purpose of this project is to help students understand and apply material from the March 1 and 15 and April 5 class meetings concerning data collection, entry, and analysis. All students should purchase a ZIP disk and bring it to class on March 1 as you will need it for this project. Students should work on project 2 as we proceed through this material. We will discuss several intermediate products which students should bring to class. The final project is due April 12.
For March 1 review the questions that you designed for Project 1 and modify them as necessary so that there are questions that require numeric data (like "How often do you drive on Highway 101 each week"). All of the attitude and opinion data should have a coding scheme to measure the responses. For each question prepare a matrix which arrays hypothetical responses from ten respondents. Each respondent should be one row and each question as one column. For example if respondents # 15 and # 16 answered questions as indicated in bold. The rows are often referred to as cases or records:
| 1. Respondent ID 15 | 1. Respondent ID 16 |
| 2. Where do you live? 1--San Francisco 2--Oakland 3--Berkeley 4--Somewhere else |
2. Where do you live? 1--San Francisco 2--Oakland 3--Berkeley 4--Somewhere else |
| 3. What is your political party affiliation? 1--Republican 2--Democrat 3--Other |
3. What is your political party affiliation? 1--Republican 2--Democrat 3--Other |
| 4. Which of the following best represents your
opinion about increased bicycle use. 1--Increased
bicycle use should be strongly 2--Increased bicycle use should be somewhat 3--Increased bicycle use should be neither 4--Increased bicycle use should be somewhat 5--Increased bicycle use should be strongly |
4. Which of the following best represents your
pinion about increased bicycle use. 1--Increased bicycle use should
be strongly 2--Increased bicycle use should be somewhat 3--Increased bicycle use should be neither 4--Increased bicycle use should be somewhat 5--Increased bicycle use should be strongly |
| 5. What is your age 2 7 | 5. What is your age 6 3 |
The matrix would look like this:
ID City Party Bicycle Use Age
15 |
2 |
2 |
1 |
27 |
16 |
1 |
1 |
5 |
63 |
Bring your revised data to class on March 8. Each question is one variable. Assign a name to each variable. These are called variable names. Your variable names should consist just of letters and not have spaces. They should not be longer than eight characters and they should be as descriptive as possible. Avoid obscure and fancy naming. For example variable names for the above survey might be:
ID
CITY
PARTY
BIKEUSE
AGE
On March 8 you should also label each of the numeric codes attached to your nominal and ordinal level variables where a name would be useful. (It is not needed for the variable ID and generally not needed for numeric data like AGE). These are called value labels. For example the value labels associated with the above variables might be:
ID [None needed]
CITY
San
Francisco 1
Oakland
2
Berkeley
3
Other
city
4
PARTY
Democrat
1
Republican
2
Other
3
BIKEUSE
Strongly encourage 1
AGE [None Needed]
During the class sessions on March 1 and 15 we will learn how to
create, name, and save computer data files onto the ZIP disk and open them for subsequent
use. We will also learn how to enter and clean data and assign names to the variables and
value labels to the values of the data in the data file. We see how data entry errors can
be corrected, additional data added, and unwanted data removed. We learn to find
individual cases and print out file information on all the variables.
During the March 15 and April 5 class sessions we will learn to do quantitative data analysis is done using a computerized statistical package. We will be using the Statistical Package for the Social Sciences (SPSS), which is the statistical package most commonly used for social science and public policy data analysis. The emphasis in these sessions is in understanding how statistical packages work and learning to work with them. We will learn how to do some conceptually easy, but very powerful and very useful analytic procedures. URBS 493 "Data Analysis" will build on this foundation.
We will learn how to compute new variables and recode data. We will also learn how to summarize nominal level data using SPSSs FREQUENCIES command and how to compute the minimum, maximum, and mean for a ratio level variable using SPSSs DESCRIPTIVES command. We see how social scientists do bivariate analysis using SPSSs CROSSTABS command. We will also learn how generate visual representations of quantitative data using SPSSs PIE and BAR chart commands..
Project 2 is due on April 12. It should consist of the following things:
- A two page description of your experience collecting, coding, and analyzing data;
- A printout of the variable names and value labels for your data file using the SPSS File Information command;
- A list of the values of all your data generated by the SPSS LIST command;
- Frequency distributions for all of your coded nominal level variables;
- A frequency distribution of one new variable you create and recode from one of the original variables using the SPSS COMPUTE and RECODE commands;
- A pie chart for one of your coded nominal level variables;
- A bar chart for one of your ratio level variables;
- The MIN, MAX, and MEAN for all of your ratio level variables generated using the SPSS DESCRIPTIVES command;
- A CROSSTABULATION using one of your coded nominal level variables as the dependent and one as your independent variable.