Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Quantitative Analysis

Similar presentations


Presentation on theme: "Advanced Quantitative Analysis"— Presentation transcript:

1 Advanced Quantitative Analysis
Shannon Milligan, PhD Institutional Research & Market Analytics Jen Sweet, PhD Teaching, Learning & Assessment April 27, 2018 8009 DePaul Center 1:00pm-2:30pm

2 Workshop Outcomes By the end of this workshop, participants will be able to: Distinguish between Parametric and Nonparametric statistics. Adjust interpretation of the results of parametric statistics when assumptions are violated Define the General Linear Model (GLM) Describe how this model is related to many common statistical methods. Use SPSS for Basic Statistical Analyses Determine When to Use an ANOVA Analysis and, if Appropriate, Run the Analysis Using SPSS

3 Workshop Agenda Parametric v. Nonparametric Statistics
General Linear Model Brief SPSS Overview ANOVA Running Descriptive Statistics in SPSS Running an ANOVA in SPSS

4 Parametric v Nonparametric Analysis

5 Parametric Tests Parametric Tests
Make assumptions about the parameters (or defining properties) of the population that is being studied Most frequent assumptions Distribution of the dependent variable(s) Nature of the data (at least interval-level data – i.e., the data is being measured on a scale with fixed, equal & measurable intervals) Add something about sample size for parametric tests

6 Non-Parametric Tests Do not make assumptions about the underlying population distribution or nature of the data being collected

7 Is there Something In Between?
Yes! Semi-Parametric Statistics Some statistics, such as Bayesian statistics, can begin with a defined underlying population distribution, then update that distribution with known information, such as data about the population or sample data Sadly, we won’t have time to get into this or any non-parametric statistics :(

8 Why is this Important to Know?
The statistics we’ll discuss today are parametric statistics that assume: The population is normally distributed along the dependent variable. You are measuring the data using at least an interval scale Homogeneity of variance – the population, and all samples you could draw have equal variance in regards to your dependent variable

9 What if I Violate these Assumptions?
Most likely, you will… Ideally, you should select a statistical test that is more appropriate given your data Minimally, you should understand how these violations affect the interpretation of your results Refer to:

10 General linear model

11 GLM The General Linear Model is a basic statistical model upon which a lot of common statistics are based. Loosely, based on the formula for a straight line: Y=mX + b Y(outcome, or dependent variable) m(slope of the line) x(independent variable) b(Y intercept)

12 However: The GLM is expressed a little differently: Y = b0 + b1X + E
Y = outcome (or dependent) variable X = independent variable b0 = slope B1 = beta weight (or regression coefficient) of first independent variable Represents the independent contribution of the X (independent variable) to the Y (dependent variable)

13 Examples T-test Y = B0 + B1*X1 + E
Where B1 is the difference between the means of two groups Determine if difference in ACT scores between males and females. ANOVA Y = B0 + B1*X1 + B2*X2 + E Where B1 is the difference between the means of two or more groups Determine if difference in ACT scores based on gender and race. Multiple Regression Y = B0 +B1*X1 + B2*X2 + B3*X3 + E Where each B represents the independent contribution of its associated X to Y Account for the Effects of Gender, Race, and Class Rank in Predicting Students’ ACT scores.

14 Common Statistics that Use GLM
Many common statistics use the Generalized Linear Model as their base: Student’s T-test Analysis of Variance (ANOVA, ANCOVA, MANOVA) Multiple Regression Multivariate Regression Structural Equation Modeling (SEM) Hierarchical Linear Modeling (HLM)

15 Least Squares Estimation Method
The GLM uses the least squares method to estimate the parameters of the model The GLM fits a straight line to your data that minimizes the squared distance between each data point and the ‘best fit’ line.

16 Brief overview of SPSS

17 What is SPSS? Statistical Package for the Social Sciences (SPSS)
Widely used statistical analysis program (across disciplines and industries) Menu-driven program, though can use syntax *DePaul access?

18 Pros and Cons of SPSS Advantages Disadvantages Widely-used
Easy to import Excel files User-friendly “plug and chug” Does all calculations for you Disadvantages Requires some training A lot of options; need to know how to select appropriate options for the analysis you would like to run Need to be able to read and appropriately interpret output Potential problem = too easy to run analyses without understanding them May be expensive Limited data visualization capabilities

19 Running descriptive statistics in SPSS

20 Sample Dataset Chicago Public Schools Progress Report Card (publicly available from Chicago Data Portal) N = 566 schools 79 variables in dataset

21 Sample Question (Frequencies)
How many elementary schools were in CPS in ? Use “ElementaryMiddleorHighSchool” variable Frequency analysis

22 Selecting Variable(s) for Analysis

23 Answer: 462 elementary schools

24 Sample Question (Descriptives)
On average, how many CPS elementary school students exceeded state expectations on the Illinois Standards Achievement Test (ISAT) Math? Use “ISATExceedingMath” variable

25 Selecting Descriptive Analysis

26 Alternate Approaches

27 Answer: 20% Across the CPS elementary schools, roughly 20% of students exceeded state standards for the ISAT Math The median of 16% tells us that the data is skewed in favor of larger values The min and max values tell us that there’s a lot of variance between values

28 ANOVA

29 ANOVA Stands for Analysis of Variance
It is used to compare means among different groups Examples: Gender; Different Age Groups; Race Used to answer questions like is there a difference in performance on the ACT between students based on their racial identity?

30 One-Way versus Two-Way ANOVA
One-Way has multiple levels of one independent variable (race) Two-Way is looking at two different independent variable (gender and race)

31 Formula for ANOVA Y = B0 + B1X1 + B2X2 + B3X3 + B4X4 + E Where Y = ACT score B1X1 = Race 1 B2X2 = Race 2 B3X3 = Race 3 B4X4 = Race 4

32 Results of ANOVA Initial Results only tell you if there are differences between the groups To determine where, specifically, the differences are, you need to run additional (post hoc) analyses This can be done in SPSS

33 Running ANOVA in SPSS

34 Sample Question (ANOVA)
Is there a difference in college enrollment between collaborative networks? Use “CollaborativeName” variable as Independent Variable (Factor) 5 groups: Far South Side Collaborative North-Northwest Side Collaborative South Side Collaborative Southwest Side Collaborative West Side Collaborative Use “CollegeEnrollmentRate” variable as Dependent Variable Note: the data needs to be coded to run an ANOVA (ex. Far South Side Collaborative is coded as “1”)

35 Selecting ANOVA Select one-way ANOVA because we have 1 IV with 5 levels. If we had more than 1 IV, we’d use “General Linear Model” and then “Univariate”

36 Selecting Variables for Analysis
Remember that “Factor” = “Independent Variable”

37 Selecting Post-Hoc Analysis
This is for follow-up analyses

38 Descriptives These tell us the average college enrollments for each of the 5 collaborative networks

39 ANOVA Table This tells us that there is a statistically significant difference between collaborative networks on college enrollment. We check this value again .05-if it is less than .05, we determine there is a significant difference. But we don’t know where the difference(s) is.

40 Follow-up Analysis The statistically significant differences are between: Far South Side and North-Northwest Side Far South Side and Southwest Side North-Northwest Side and South Side North-Northwest and West Side South Side and Southwest Side

41 Follow-up Questions What do the statistically significant differences (and lack thereof) tell us? Why are there so many differences between the North-Northwest Side and other networks?

42 Any Questions?

43 Contact Information Jen Sweet Associate Director, TLA Shannon Milligan Research Associate, IRMA


Download ppt "Advanced Quantitative Analysis"

Similar presentations


Ads by Google