1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

The Simple Regression Model
5nd meeting: Multilevel modeling: Summary & Extra’s Subjects for today:  How to do multilevel analysis: a 5-step-approach  Interaction, cross-level interactions,
The bowl of rice problem Suppose we take a random sample of rice from a bowl (blind folded):
10-3 Inferences.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Chapter 5 Introduction to Inferential Statistics.
The standard error of the sample mean and confidence intervals
PSY 307 – Statistics for the Behavioral Sciences
The standard error of the sample mean and confidence intervals
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Topics: Inferential Statistics
Clustered or Multilevel Data
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Lecture 9: One Way ANOVA Between Subjects
An Introduction to Logistic Regression
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Standard Error of the Mean
3nd meeting: Multilevel modeling: introducing level 1 (individual) and level 2 (contextual) variables + interactions Subjects for today:  Intra Class.
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Hypothesis testing – mean differences between populations
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
EDRS 6208: Fundamentals of Educational Research 1
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
Correlation Association between 2 variables 1 2 Suppose we wished to graph the relationship between foot length Height
Association between 2 variables
Review from before Christmas Break. Sampling Distributions Properties of a sampling distribution of means:
Chapter 7 Estimation Procedures. Basic Logic  In estimation procedures, statistics calculated from random samples are used to estimate the value of population.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 7 - Sampling Distribution of Means.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
2nd meeting: Multilevel modeling: intra class correlation Subjects for today:  Multilevel data base construction  The difference between single level.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Confidence Intervals for Variance and Standard Deviation.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Estimating a Population Mean. Student’s t-Distribution.
Confidence Intervals for a Population Mean, Standard Deviation Unknown.
Lecture 8: Ordinary Least Squares Estimation BUEC 333 Summer 2009 Simon Woodcock.
Normal Distributions. Probability density function - the curved line The height of the curve --> density for a particular X Density = relative concentration.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
Multiple Regression Scott Hudson January 24, 2011.
Statistical analysis.
Statistical analysis.
Kin 304 Regression Linear Regression Least Sum of Squares
12 Inferential Analysis.
BPK 304W Correlation.
12 Inferential Analysis.
Sampling Distributions
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Skills 5. Skills 5 Standard deviation What is it used for? This statistical test is used for measuring the degree of dispersion. It is another way.
Combining Random Variables
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel analysis  Multilevel data base construction

INFERENTIAL STATISTICS…. 2

The distribution of age in the Netherlands in

4

Statistical testing using a normal distribution 5

6

Calculate a standard error: take the standard deviation (s) and divide it by the square root of the sample size (n) SE= s / √n assumption: the units in the sample are drawn at random. This means that all units in the population have a equal chance to be sampled. In our example the standard deviation is 20.6 and we have 1000 units in a sample: SE= 20.6 / 31.6 = 0.65 Now suppose that when we sample unit 1 we have a 100% chance to sample unit 2, 3, 4, 5, 6, 7, 8, 9 and 10. Suppose further that all these units have more or less the same score on age. As a consequence the sample is not 1000 anymore but 1000 / 10= 100! Hence the standard error is no longer.65 but 20.6 / 10 = 2.06! As a consequence results may be not significant (lower p-values!). So, when a sample is clustered (for instance sampling unit 1 is sampling unit as well) our effective sample tends to decrease! This is important to note because in multilevel we use clustered samples! 7

Clustered samples ….. (see also the file “the bowl of rice problem.pdf”). One may think that we still use a large number of cases but due to clustering we use an effective sample of only 6. Standard error would be close to 20.6 / 2.44 = 8.16 years!!! 8

Before we turn to multilevel analysis, let us first take a look a the data structure and how to construct a multilevel data set. We take as an example Indonesia with about a 100 districts, like Bandung, Majalengka, and Serang. Suppose we have a lot of respondents randomly from these districts. Then we may have: A dataset from Bandung, a dataset from Majalengka and a dataset from Serang and Maybe many more… First we like to construct a dataset with all respondents from all districts. 9

Bandung.sav Majalengka.sav Serang.sav District x.sav Bandung + Majalengka + Serang + District x.sav * SPSS SYNTAX:. ADD FILES FILE "c:/multilevelmodeling/bandung.sav" /FILE "c:/multilevelmodeling/majalengka.sav" /FILE "c:/multilevelmodeling/serang.sav" /FILE "c:/multilevelmodeling/district x.sav ". EXECUTE. SAVE OUTFILE= "c:/multilevelmodeling/all_individuals.sav". NOTE: all files should have same data definitions!! 10

Ok, now we have something like this: Individual District body height 1 Bandung150 2 Bandung145 3 Bandung156 4 Majalengka118 5 Majalengka174 6 Majalengka156 7 Serang167 8 Serang153 9 Serang District X District X District X188 11

Now this may serve your purposes. For example you like to estimate the height effect on say one’s body weight with a linear regression model using ordinary least squares (OLS): Body weight = a + b1* body height + e This may be problematic as the error may be correlated because we have people from different districts (and districts may vary in average body height). Bandung Majalengka Serang 12

SOLUTION: Body weight = a + b1* body height + b2 * Bandung + b3 * Majalengka + e Districts are so-called ‘dummy-variables’ code 0 and 1 Serang is left out of equation: it is the reference category. So b2 is the estimated difference in average body weight in Bandung and Serang while taking into account body height differences. The body height effect is now more correct because we rule out the effects of districts. When we restrict our analyses to these districts, statistical speaking this is ok. BUT, suppose we like to know WHY people in say Bandung on average are heavier than people in Serang. We need more information from these districts like average welfare. 13

SOLUTION 1: Body weight = a + b1* body height + b2 * Average Welfare + e Problem: we have only three datapoints for Average Welfare: Bandung, Majalengka and Serang. Problem2: The effect of welfare would only be valid for these three districts Problem3: we must assume that Average Welfare explains all level 2 variance otherwise part of the error is still correlated Problem 4: we mix up individual variance (not all people in Serang have same body weight) with district variance (not all districts have same average body weight : it is all in the variance of e. 14

Solution 2: We first randomly collect a large number of districts say 30 or more and then randomly select individuals from these districts. We set up a multilevel equation: Y = a + b1 * body height + e1 (individual level) + b2 * average welfare + e2 (district level) e1 is within (districts) variance (individual variance), e2 is between (districts) variance I return to this in detail during next meeting. First: how to get the data right for this? 15

Problem: we have individual data from many districts added into one big file, BUT we need to add the Welfare figures for each state: Individual District body heightWelfare (in € per capita) 1 Bandung Bandung Bandung Majalengka Majalengka Majalengka Serang Serang Serang District X District X District X

SPSS Syntax to construct multilevel data files: GET FILE= "c:\multilevelmodeling\welfare.sav". * Watch it: data MUST be sorted by country first!!. sort cases by country. SAVE OUTFILE= "c:/multilevelmodeling/welfare.sav". GET FILE= "c:\multilevelmodeling\all_individuals.sav". * Watch it: sort data MUST be sorted by country first!!. sort cases by country. SAVE OUTFILE= "c:/multilevelmodeling/all_individuals.sav". match files table= "c:/ multilevelmodeling\welfare.sav" /file= "c:/multilevelmodeling/all_individuals.sav" /by country. EXE. 17