Lecture 4 Statistical analysis

Slides:



Advertisements
Similar presentations
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Advertisements

Correlation and Linear Regression.
Bivariate Analyses.
Correlation Chapter 9.
Data Analysis Statistics. Inferential statistics.
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
The Simple Regression Model
CJ 526 Statistical Analysis in Criminal Justice
Data Analysis Statistics. Inferential statistics.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
Leedy and Ormrod Ch. 11 Gray Ch. 14
Inferential Statistics: SPSS
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Covariance and correlation
CJ 526 Statistical Analysis in Criminal Justice
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
The Correlational Research Strategy
Examining Relationships in Quantitative Research
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association.
Recap of data analysis and procedures Food Security Indicators Training Bangkok January 2009.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Correlation & Regression Analysis
The Correlational Research Strategy Chapter 12. Correlational Research The goal of correlational research is to describe the relationship between variables.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Introduction to Marketing Research
Nonparametric Statistics
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Chapter 12 Understanding Research Results: Description and Correlation
Final Project Reminder
BINARY LOGISTIC REGRESSION
Hypothesis Testing.
Final Project Reminder
Introduction to Regression Analysis
Dr. Siti Nor Binti Yaacob
Correlation – Regression
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Regression Analysis Simple Linear Regression
Basic Statistics Overview
Multiple Regression.
CHAPTER fourteen Correlation and Regression Analysis
Statistics for the Social Sciences
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Social Research Methods
Correlation and Regression
Stats Club Marnie Brennan
Inferential Statistics
Nonparametric Statistics
Ass. Prof. Dr. Mogeeb Mosleh
CORRELATION ANALYSIS.
LEARNING OUTCOMES After studying this chapter, you should be able to
STATISTICS Topic 1 IB Biology Miss Werba.
Product moment correlation
An Introduction to Correlational Research
Inferential Statistics
15.1 The Role of Statistics in the Research Process
Parametric versus Nonparametric (Chi-square)
Making Use of Associations Tests
Warsaw Summer School 2017, OSU Study Abroad Program
COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.
Presentation transcript:

Lecture 4 Statistical analysis Heidi Hogset September, 2014

Lecture aim and objectives Investigate methods of statistical analysis Objectives Research questions and hypotheses Statistical tests SCM300 21.09.2018

Research questions Survey research is all about asking questions Descriptive questions, univariate – covered in lecture 3 E.g., how has gender composition of enrolled students changed over the last 20 years? Causal relationships, multivariate – topic today E.g., what is the relationship between unemployment rates and applications for admission to higher education? SCM300 21.09.2018

Research questions The research question is The variables required are Does the unemployment rate influence people’s propensity to seek higher education? The variables required are Unemployment rate each year in Norway (choose period) Number of applications for admission submitted to colleges and universities in Norway each year (same period) SCM300 21.09.2018

Research questions Suggested causal relationship? Source for left image: MS clipart Source for right image:http://www.businessinsider.com/initial-jobless-claims-march-29-2014-4 SCM300 21.09.2018

Research questions Hypothesis – a statement to test a particular proposition Example: In periods with higher levels of unemployment, colleges and universities receive more applications for admission SCM300 21.09.2018

Research questions Observed trends Variables: Share of population in each age bracket that is enrolled in higher education (%) Share of labor force in age bracket that is registered as totally unemployed (%) SCM300 21.09.2018

Research questions We assume causal relationship goes from unemployment to school applications, not vice versa “Applications” is a Dependent variable (DV) “Unemployment” is an Independent variable (IV) SCM300 21.09.2018

Research questions Null hypothesis There is NO relationship between unemployment rates and school applications SCM300 21.09.2018

Research questions Alternative hypotheses There is a significant relationship between unemployment rates and school applications (non-directional) Two-tailed There is a significant and positive relationship between unemployment and school applications (directional) One-tailed SCM300 21.09.2018

Research questions Hypothesis testing One-tailed Two-tailed Use 2-tailed unless you have a good reason to choose 1-tailed SCM300 21.09.2018

Your survey You are expected to develop (a) research question(s) based on theory developed in your discipline of interest Example: Green Taxing Does it work? How strong is the effect? SCM300 21.09.2018

Your survey Variables needed? How might you create the variables using a survey? What hypothesis might you use? SCM300 21.09.2018

Statistical analysis The significance of each hypothesis is tested using statistical analysis The objective is to reject or accept each hypothesis “Accept” means you have not disproved it “Accept” does not mean the hypothesis has been proved SCM300 21.09.2018

Statistical analysis Relationship between two variables One-Sample T Test Paired Samples T Test Independent Samples T Test Chi-square Test One-Way ANOVA Correlation analysis Simple Regression Analysis SCM300 21.09.2018

Statistical analysis Relationship between > two variables Multiple Regression Analysis Logistic Regression Analysis SCM300 21.09.2018

Compare two variables – example 1 Number of students aged 19-29 enrolled in higher education in Norway, by sex, 2000-2010 “Male” is number of male students “Female” is number of female students 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Male 27 826 27 103 28 040 28 151 28 612 30 817 30 099 31 478 37 564 40 345 42 544 Female 31 721 31 827 33 669 33 204 33 392 37 062 37 122 39 690 46 188 49 131 51 446 SCM300 21.09.2018

One Sample T Test Compute difference: Female – Male Test: Is the difference significantly different from zero? SCM300 21.09.2018

One Sample T Test SCM300 21.09.2018

One Sample T Test SCM300 21.09.2018

One Sample T Test If our test is directional, we should use the one-tailed significance, which is half of the 2-tailed significance. However, here, 0,000/2 = 0,000: No difference. SCM300 21.09.2018

Paired Samples T Test Compare variables directly: Female vs. Male Test: Are their means significantly different? SCM300 21.09.2018

Paired Samples T Test SCM300 21.09.2018

Paired Samples T Test SCM300 21.09.2018

Paired Samples T Test SCM300 21.09.2018

Independent Samples T Test Has the finance crisis in 2007 influenced enrolment in higher education? Test: Are the means significantly different before and after 2007? SCM300 21.09.2018

Independent Samples T Test SCM300 21.09.2018

Independent Samples T Test SCM300 21.09.2018

Independent Samples T Test Note – the difference between the means is not only due to the finance crisis. There is also a long-term trend towards higher enrolment rates that we should have controlled for. The F is Levene’s test for Equality of Variances. A small value of significance here indicates that the appropriate T Test is that where Equal variances are not assumed. SCM300 21.09.2018

Compare two variables – example 2 A survey of pigeonpea farmers in Tanzania in 2008 – variables: District (there are 4) Respondent characteristics (there are 609 respondents) The respondent’s sex, age, number of years in school, number of dependents in the household, distance to the nearest main market (for farm products) The respondent’s farm operation The number of plots planted in improved varieties of pigeonpeas divided by the total number of plots planted in pigeonpeas (“share”) SCM300 21.09.2018

Chi-square Test We need to compare two nominal or ordinal variables (discrete data) We want to check if our sampling procedure has produced a biased sample with respect to gender composition We assume the proportion of households with a female household head is independent of districts Test: Is the sex distribution different between districts? Chi-square Test for Independence of Discrete Data Data for which the only meaningful statistics are frequencies and percentages SCM300 21.09.2018

Chi-square Test SCM300 21.09.2018

Chi-square Test Select Statistics, then tick Chi-square Select Cells, then tick Expected (and untick Observed) SCM300 21.09.2018

Chi-square Test Chi-square Test of Independence Karatu o Arumeru o Chi-square Test of Independence A small chi-square statistic indicates that there is a significant relationship between the two variables – they are NOT independent of each other Here, we have a large number, close to 1. Therefore, we ACCEPT the null that the two variables are independent SCM300 21.09.2018

One-Way ANOVA Compare two variables Procedure: Test 1: Is the share of pigeonpea fields in improved varieties different between districts? Test 2: Does the share of pigeonpea fields in improved varieties vary by the farmer’s school experience? Procedure: Analyze/ Compare Means/ One-Way ANOVA/ Select DV (Share improved pigeonpeas)/ Select factor (District)/ OK The DV should be interval or ratio data type The populations should be normally distributed and the population variances should be equal. This procedure becomes cumbersome when the number of factors goes beyond 3-4. Mainly used in psychological research using experimental data, which is not common in economics research Problem: Is Share an interval or ratio type data? SCM300 21.09.2018

One-Way ANOVA The variable Share has range from 0 (no improved pigeonpea) to 1 (only imporved pigeonpea), with values clustering around the values 0 (75,5% of observations) and 1 (15,3%), ¼, ¾, ½, 1/3, 2/3 The variable School measures the number of years the respondent has attended school. Observations vary from 0 to 16. There are 4 districts. SCM300 21.09.2018

Correlation analysis Examines bivariate relationships between ≥ 2 ORDINAL or INTERVAL/ RATIO variables They are CORRELATED if they are systematically related Positively: The variables tend to move in the same direction Negatively: The variables tend to move in opposite directions Un-correlated: No relationship Can be run with any kind of data, but is not appropriate for nominal variables with more than 2 categories SCM300 21.09.2018

Correlation analysis Correlation is measured by the correlation coefficient, r Helps to think of correlation in visual terms Perfect – Mod. – No rel. Mod. + Perfect + -1 -0.7 -0.5 -0.1 0.1 0.5 0.7 1 Strong – Weak – Weak + Strong + Mod. = Moderate SCM300 21.09.2018

Correlation analysis r  -1 r  1 SCM300 21.09.2018

Correlation analysis r  0? r  0 SCM300 21.09.2018

Correlation analysis Scatter-plot procedure in SPSS Graphs Legacy Dialogs Scatter/ Dot Select Simple Scatter Define IV for x-axis DV for y-axis OK SCM300 21.09.2018

Correlation analysis Correlation procedure SPSS Analyze Correlate Bivariate Add variables to variables list Tick Pearson’s for interval/ ratio data (Spearman’s for ordinal) OK SCM300 21.09.2018

Correlation analysis SCM300 21.09.2018

Correlation analysis Correlation shows pairwise strength of relationship, but not causality Causality indicates the likely impact of IV on DV E.g. what is the long-term trend in enrolment in higher education? It is possible to calculate correlations between a large number of variables You get a matrix with the same number of rows and columns as the total number of variables (If you use the partial correlations procedure, you can calculate correlation between two variables, controlling for the effect of a third variable) SCM300 21.09.2018

Simple regression analysis Linear regression procedure in SPSS Analyze Regression Linear Transfer DV and IV(s) to list OK Note – the DV should be continuous It is not a condition that the DV has normal distribution SCM300 21.09.2018

Simple regression analysis Enrolment = -6.953.623,136 + 3.503,373*Year SCM300 21.09.2018

Simple regression analysis Best fit line procedure in SPSS Analyze Regression Curve estimation Place variables on RHS Tick linear OK SCM300 21.09.2018

Simple regression analysis SCM300 21.09.2018

Summary 2 variables One-Sample T Test Paired Samples T Test Test if one variable is different from (for example) zero Paired Samples T Test Test if two variables in the same sample are different from each other Independent Samples T Test Test if (the same) variable(s) in different samples are different from each other SCM300 21.09.2018

Summary 2 variables Chi-square Test One-Way ANOVA Correlation analysis Appropriate for discrete data One-Way ANOVA The DV must be a ratio variable and continuous Correlation analysis Can be run with any kind of data, but is not appropriate for nominal variables with more than 2 categories SCM300 21.09.2018

Summary 2 variables Simple Regression Analysis Best Fit Line Same procedure as multiple linear regression analysis, only with fewer variables (just one IV) Appropriate if DV is continuous Best Fit Line Is a simple linear regression analysis under a different name SCM300 21.09.2018

Multiple regression analysis Let’s return to the African farmers What explains whether farmers grow improved or traditional varieties of pigeonpeas? Available variables: Farmer characteristics (sex, age, education, household size) Environmental factors (distance to the nearest main market, district) SCM300 21.09.2018

Multiple regression analysis Design issues There are several regression methods, based on theoretical considerations In the absence of a strong theoretical reason to choose otherwise, use the standard procedure, i.e., “Enter” If IVs are correlated, they will generate a variance inflation effect that reduces the statistical significance of results To check for correlation (“multicollinearity”), we want to run a test along with the regression SCM300 21.09.2018

Multiple regression analysis SCM300 21.09.2018

Multiple regression analysis SCM300 21.09.2018

Multiple regression analysis SCM300 21.09.2018

Multiple regression analysis (1 = Female) Using standardized coefficients, interpretations are based on the standard deviations of the variables. Each coefficient indicates the number of standard deviations that the predicted response changes for a one standard deviation change in a predictor, all other predictors remaining constant. A value of Tolerance <0.2 or of VIF >5 indicates presence of multicollinearity. SCM300 21.09.2018

Multiple regression analysis Nominal variables are converted to dummies Sex (binary, 0 = Male, 1 = Female) Districts One category is omitted (here: Karatu District) The others are represented by binary variables (0 = No, 1 = Yes) SCM300 21.09.2018

Logistic regression The DV is a discrete variable Binary Nominal with more than 2 categories Sensitive to problems like… Multicollinearity Small sample size SCM300 21.09.2018

Logistic regression Missing observations are farmers with no pigeonpeas at all. SCM300 21.09.2018

Logistic regression SCM300 21.09.2018

Logistic regression The reference category is farmers in Karatu who have planted all of their pp fields in improved varieties (… and District = Karatu) SCM300 21.09.2018

Summary ≥ 2 variables Multiple regression analysis Logistic regression Maps the relationship between one DV and many IVs DV must be a ratio variable and continuous IV can be ratio or nominal (dummy) Logistic regression Appropriate if the DV is a discrete variable Sensitive to multicollinearity and small sample size SCM300 21.09.2018