Advanced Quantitative Analysis

Slides:



Advertisements
Similar presentations
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Advertisements

Hypothesis testing 5th - 9th December 2011, Rome.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Lecture 6: Multiple Regression
Multiple Regression Research Methods and Statistics.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Simple Linear Regression Analysis
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Leedy and Ormrod Ch. 11 Gray Ch. 14
Understanding Research Results
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inferential Statistics: SPSS
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
SPSS Series 1: ANOVA and Factorial ANOVA
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
ANOVA: Analysis of Variance.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Appendix I A Refresher on some Statistical Terms and Tests.
Lecture Slides Elementary Statistics Twelfth Edition
Categorical Variables in Regression
I. ANOVA revisited & reviewed
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Nonparametric Statistics
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Inference for Regression
Regression Analysis AGEC 784.
Lecture 10 Regression Analysis
Bivariate & Multivariate Regression Analysis
Psych 706: stats II Class #4.
Correlation, Bivariate Regression, and Multiple Regression
Dr. Siti Nor Binti Yaacob
B&A ; and REGRESSION - ANCOVA B&A ; and
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Multiple Regression.
Regression Analysis.
Analysis of Covariance (ANCOVA)
Statistics for the Social Sciences
Dr. Siti Nor Binti Yaacob
12 Inferential Analysis.
Chapter 15 Linear Regression
BIVARIATE REGRESSION AND CORRELATION
Lecture Slides Elementary Statistics Thirteenth Edition
I271B Quantitative Methods
Stats Club Marnie Brennan
Nonparametric Statistics
Introduction to Statistics
Hypothesis testing. Chi-square test
12 Inferential Analysis.
Correlation and Regression
15.1 The Role of Statistics in the Research Process
3 basic analytical tasks in bivariate (or multivariate) analyses:
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Advanced Quantitative Analysis Shannon Milligan, PhD Institutional Research & Market Analytics Jen Sweet, PhD Teaching, Learning & Assessment April 27, 2018 8009 DePaul Center 1:00pm-2:30pm

Workshop Outcomes By the end of this workshop, participants will be able to: Distinguish between Parametric and Nonparametric statistics. Adjust interpretation of the results of parametric statistics when assumptions are violated Define the General Linear Model (GLM) Describe how this model is related to many common statistical methods. Use SPSS for Basic Statistical Analyses Determine When to Use an ANOVA Analysis and, if Appropriate, Run the Analysis Using SPSS

Workshop Agenda Parametric v. Nonparametric Statistics General Linear Model Brief SPSS Overview ANOVA Running Descriptive Statistics in SPSS Running an ANOVA in SPSS

Parametric v Nonparametric Analysis

Parametric Tests Parametric Tests Make assumptions about the parameters (or defining properties) of the population that is being studied Most frequent assumptions Distribution of the dependent variable(s) Nature of the data (at least interval-level data – i.e., the data is being measured on a scale with fixed, equal & measurable intervals) Add something about sample size for parametric tests

Non-Parametric Tests Do not make assumptions about the underlying population distribution or nature of the data being collected

Is there Something In Between? Yes! Semi-Parametric Statistics Some statistics, such as Bayesian statistics, can begin with a defined underlying population distribution, then update that distribution with known information, such as data about the population or sample data Sadly, we won’t have time to get into this or any non-parametric statistics :(

Why is this Important to Know? The statistics we’ll discuss today are parametric statistics that assume: The population is normally distributed along the dependent variable. You are measuring the data using at least an interval scale Homogeneity of variance – the population, and all samples you could draw have equal variance in regards to your dependent variable

What if I Violate these Assumptions? Most likely, you will… Ideally, you should select a statistical test that is more appropriate given your data Minimally, you should understand how these violations affect the interpretation of your results Refer to: http://acp.depaultla.org/wp-content/uploads/sites/4/2016/10/Quantitative-Analysis-October-27-2017.pdf

General linear model

GLM The General Linear Model is a basic statistical model upon which a lot of common statistics are based. Loosely, based on the formula for a straight line: Y=mX + b Y(outcome, or dependent variable) m(slope of the line) x(independent variable) b(Y intercept)

However: The GLM is expressed a little differently: Y = b0 + b1X + E Y = outcome (or dependent) variable X = independent variable b0 = slope B1 = beta weight (or regression coefficient) of first independent variable Represents the independent contribution of the X (independent variable) to the Y (dependent variable)

Examples T-test Y = B0 + B1*X1 + E Where B1 is the difference between the means of two groups Determine if difference in ACT scores between males and females. ANOVA Y = B0 + B1*X1 + B2*X2 + E Where B1 is the difference between the means of two or more groups Determine if difference in ACT scores based on gender and race. Multiple Regression Y = B0 +B1*X1 + B2*X2 + B3*X3 + E Where each B represents the independent contribution of its associated X to Y Account for the Effects of Gender, Race, and Class Rank in Predicting Students’ ACT scores.

Common Statistics that Use GLM Many common statistics use the Generalized Linear Model as their base: Student’s T-test Analysis of Variance (ANOVA, ANCOVA, MANOVA) Multiple Regression Multivariate Regression Structural Equation Modeling (SEM) Hierarchical Linear Modeling (HLM)

Least Squares Estimation Method The GLM uses the least squares method to estimate the parameters of the model The GLM fits a straight line to your data that minimizes the squared distance between each data point and the ‘best fit’ line.

Brief overview of SPSS

What is SPSS? Statistical Package for the Social Sciences (SPSS) Widely used statistical analysis program (across disciplines and industries) Menu-driven program, though can use syntax *DePaul access?

Pros and Cons of SPSS Advantages Disadvantages Widely-used Easy to import Excel files User-friendly “plug and chug” Does all calculations for you Disadvantages Requires some training A lot of options; need to know how to select appropriate options for the analysis you would like to run Need to be able to read and appropriately interpret output Potential problem = too easy to run analyses without understanding them May be expensive Limited data visualization capabilities

Running descriptive statistics in SPSS

Sample Dataset Chicago Public Schools Progress Report Card 2011-2012 (publicly available from Chicago Data Portal) N = 566 schools 79 variables in dataset

Sample Question (Frequencies) How many elementary schools were in CPS in 2011-2012? Use “ElementaryMiddleorHighSchool” variable Frequency analysis

Selecting Variable(s) for Analysis

Answer: 462 elementary schools

Sample Question (Descriptives) On average, how many CPS elementary school students exceeded state expectations on the Illinois Standards Achievement Test (ISAT) Math? Use “ISATExceedingMath” variable

Selecting Descriptive Analysis

Alternate Approaches

Answer: 20% Across the CPS elementary schools, roughly 20% of students exceeded state standards for the ISAT Math The median of 16% tells us that the data is skewed in favor of larger values The min and max values tell us that there’s a lot of variance between values

ANOVA

ANOVA Stands for Analysis of Variance It is used to compare means among different groups Examples: Gender; Different Age Groups; Race Used to answer questions like is there a difference in performance on the ACT between students based on their racial identity?

One-Way versus Two-Way ANOVA One-Way has multiple levels of one independent variable (race) Two-Way is looking at two different independent variable (gender and race)

Formula for ANOVA Y = B0 + B1X1 + B2X2 + B3X3 + B4X4 + E Where Y = ACT score B1X1 = Race 1 B2X2 = Race 2 B3X3 = Race 3 B4X4 = Race 4

Results of ANOVA Initial Results only tell you if there are differences between the groups To determine where, specifically, the differences are, you need to run additional (post hoc) analyses This can be done in SPSS

Running ANOVA in SPSS

Sample Question (ANOVA) Is there a difference in college enrollment between collaborative networks? Use “CollaborativeName” variable as Independent Variable (Factor) 5 groups: Far South Side Collaborative North-Northwest Side Collaborative South Side Collaborative Southwest Side Collaborative West Side Collaborative Use “CollegeEnrollmentRate” variable as Dependent Variable Note: the data needs to be coded to run an ANOVA (ex. Far South Side Collaborative is coded as “1”)

Selecting ANOVA Select one-way ANOVA because we have 1 IV with 5 levels. If we had more than 1 IV, we’d use “General Linear Model” and then “Univariate”

Selecting Variables for Analysis Remember that “Factor” = “Independent Variable”

Selecting Post-Hoc Analysis This is for follow-up analyses

Descriptives These tell us the average college enrollments for each of the 5 collaborative networks

ANOVA Table This tells us that there is a statistically significant difference between collaborative networks on college enrollment. We check this value again .05-if it is less than .05, we determine there is a significant difference. But we don’t know where the difference(s) is.

Follow-up Analysis The statistically significant differences are between: Far South Side and North-Northwest Side Far South Side and Southwest Side North-Northwest Side and South Side North-Northwest and West Side South Side and Southwest Side

Follow-up Questions What do the statistically significant differences (and lack thereof) tell us? Why are there so many differences between the North-Northwest Side and other networks?

Any Questions?

Contact Information Jen Sweet Associate Director, TLA jsweet2@depaul.edu Shannon Milligan Research Associate, IRMA Shannon.milligan@depaul.edu