Quantitative Analysis Managing the Data. Coding of Questionnaires and Production of Code book. Quantitative Measurement: Nominal, ordinal & interval variables. Simple statistics: Frequencies (percentages) Descriptives (mean, median, mode) You can write codes on the questionnaire if doing it by hand – use a column. Always number questions!
Data Analysis Having spent time designing your survey and collecting data, it is important to make a good job of your data analysis. Your analysis is guided by the hypotheses that you are exploring and relies on your ingenuity to relate the data to the theory. Data analysis is helped by efficient management of data, particularly coding. Class handout on data matrix and null hypotheses.
Managing Data: Keep all questionnaires safely and where you cannot violate your promise of anonymity and confidentiality (cabinet and key!). Check your input of data regularly on the computer – for example, SPSS, check frequencies and cross-tabulations of responses to variables to ensure no strange input! Also check frequencies of case numbers (numbers representing your respondents) to prevent entering a case (respondent) twice! When finished processing data, add frequencies of variables to the codebook.
Coding the data You need to extract the data from the questionnaires and put them into a form that is easier to refer to and to manipulate: You code data and create a data file. Draw up a coding frame. This lists all the alternative values for a given variable and allocates a number for each possible answer. Coding frame for closed questions can be taken from the questionnaire. Additional coding for open questions needs to be created when imputing data. Coding frame is usually drawn up before you attempt to code data and input data into the matrix. The job of analysis is made easier if you use a computer to create data files – e.g. SPSS, STATA, R, SAS or even Excel.
Coding Questions What sex are you? How old are you? Male q Female q How old are you? ________ What is your marital status? Never married q Married q Cohabiting q Separated q Divorced q Widowed q
Data File This involves producing a grid on which to record the appropriate codes for each respondent. Normally each row represents the respondent and each column a variable. ID no Sex Age Marital EdLevel 3125 1 26 8 3768 2 57 7 4448 39 3 99
Data File In some computer programmes you can click on a data label and reveal what each code means. 99 or 999 are usually the codes we use when there is missing data – ‘non item’ responses. ID no Sex Age Marital EdLevel 3125 male 26 Never married degree 3768 female 57 married diploma 4448 Female 39 Separated missing
Notes on the Codebook: Start creating the codebook with the questionnaire. For each question: Write down the actual question. The name(s) of the variable(s) given to represent the question [eg:Q1walk] The label of the variable. [eg: whether or not respondent walks to work]. Values (codes) and value labels for each attribute. [egg: 1 (value) = walks to work (label); 0 (value) = does not walk to work (label) and 99 = missing data] Variable type – nominal, ordinal or continuous. Code open questions (string variables) and ‘other’ categories afterwards and manually.
Example of Tricky Question to Code: Q1. Which mode of transport do you use most often to work? (please tick all those that apply) Walk q Bicycle q Rail q Bus q Car as driver q Car as passenger q Motorbike as driver q Motorbike as passenger q
Coding of ‘Tricky’ Question: Spilt the question into different variables, i.e. create multi-dichotomies. Walk variable (Q1walk): 1 = walks to work 0 = does not walk to work 99 = missing data. Bicycle variable (Q1bicycle): ETC.
Quantitative Measurement Types of Variables: Binary Nominal Ordinal Interval Discrete Count
Categorical Variables Examples Descriptive Statistics Binary 2 categories Usually coded as 1/2 or 0/1 Male/female Employed/ unemployed Supports/opposes Euro Frequencies Descriptives of 0/1 variable Crosstabs Nominal More than 2 unordered categories Usually coded as 1, 2, 3 but these are labels Social class Region of Residence Political Party vote
Continuous Variables Examples Descriptive Statistics Interval/ratio Differences have same meaning at different points on the scale Calendar year Income Weight Height Group frequencies Scatter plots Discrete Counts No of counts in a given area or period of time Number of children in a family Number of heart attacks in Oxford in 2005 Frequencies (if few values) Descriptives
Ranked Variable Examples Descriptive Statistics Ordinal Categorical with ordered categories Numeric codes 1,2,3…but numeric order corresponds to the ordering of categories Class of university degree Strength of agreement or disagreement about an issue Level of job satisfaction Frequencies Crosstabs
Counting Responses After compiling your data in your data matrix, the first step in data analysis is to summarise your data. You can do this by: Tabulating the data (‘Frequencies’) Calculating the ‘Descriptives’ i.e. summaries and variability of the data Graphs This initial step is sometimes called univariate analysis – i.e. description of one variable
Frequencies From a frequency table, you can tell how often (frequently) people gave each response. It tells you how many people selected each response to a question. Frequencies can also be used to check codes. If a code appears in the frequency table that wasn’t used in the coding scheme, you know that an error has happened in imputing the data.
IMPJOB – Importance to respondent of having a fulfilling job. Value Label Value Frequency Percent Valid Percent Cum Percent One of the most imp. 1 316 21.1 21.4 Very imp. 2 833 55.5 56.3 77.7 Somewhat 3 238 15.9 16.1 93.8 Not too import 4 62 4.1 4.2 98 Not at all 5 30 2.0 100 DK 8 7 .5 Missing NA 9 14 .9 Total 1500
Frequencies A frequency count alone is not a very good summary of the data so use both counts and percentages. Percentages are easier to visualise – unlike counts you can compare percentages across surveys with different cases. Use valid percentages – i.e. exclude the missing data.
Simple Statistics Descriptives: Mode (most frequently recurring response) Mean (average) Median (middle value if all responses were laid out in a row from smallest to largest). Used instead of mean for ordinal variables, i.e. is the mid-rank. Also is not affected by ‘outliers’ so a better measure than mean for continuous variables like age and income.
Graphs You can use a number of graphs to illustrate your findings such as: Pie chart Bar chart Histogram It depends on the type of variable!
Simple Statistics Bivariate Analysis – this involves the relationship between two variables. The simplest form is a cross-tabulation. You need categorical variables for this. For continuous variables, use a scatter plot to graph the relationship.