ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Content of this presentation Analyzing weighted data Standard errors –What are they? –Why do we need them? –How do we estimate them?
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid What are sampling weights? Values assigned to all sampling units –Weighted results from the sample can be generalized for the whole population –Weights allow unbiased estimates of population parameters Based on the sample selection probabilities –Applied at each sampling stage Adjusted to correct for non-response –Applied at each sampling stage
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 4 Weights in ICCS The ICCS Data contain several weight variables –Total Student Weight: TOTWGTS –Total Teacher Weight: TOTWGTT –Total School Weight: TOTWGTC The IDB Analyzer automatically selects the correct weight
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 5 Analyzing weighted data – a simple example 1:10 1:1
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 6 Un-weighted mean
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 7 Weighted mean
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 8 Example using ICCS data Civic knowledge score in an ICCS country Unweighted: average of
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Example using ICCS data Difference: 10.1 score points Reason for the difference: over-sampling of students in private schools –13.7% of the tested students –5.9% of the sum of weights
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid What are standard errors? The standard error of an estimate is the standard deviation of the sampling distribution associated with it The sampling distribution is the distribution of the statistic for all possible samples of the same size and method Since we do not select all possible samples, we can only estimate the standard error
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 11 What are standard errors good for? The ICCS results are based on samples All ICCS results are therefore estimates of unknown population values Standard errors can be used to measure how close these estimates are to the real values
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Confidence Intervals Let ε stand for any statistic of interest A 95% confidence interval is defined as This is the black bar in Table 3.4 With a confidence of 95%, the true mean is between and Take rounding into account!
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Estimating standard errors In a simple random sample, estimating the standard error of a mean x is easy –Just divide the standard deviation of the sample (s) by the square root of the sample size (n) In a complex sample design like in ICCS, it is not as easy to estimate the standard error as in a simple random sample ^
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Complex sample design Clustered sample –students within a school are more similar to each other than students from different schools Stratification –usually increases sampling precision Weights –complicate the calculations
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 15 Why not just use SPSS? Standard software packages like SPSS will not give correct estimates for standard errors The software assumes that the data is from a simple random sample, and uses the incorrect formula Generally, the estimate will be too small
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Jackknife Repeated Replication Solution: Jackknife Repeated Replication (JRR) Used for estimating standard errors in complex designs Basic idea: systematically re-compute a statistic on a set of replicated samples –setting the weights to zero for one school at a time –while doubling the weights of another school Estimate the variability of that statistic from the variability of that statistic between the full sample and the replicates
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid The JRR in ICCS Jackknife variance estimation in ICCS –Participating schools are paired according to the order in which they were sampled –These school pairs are called jackknife zones – JKZONES (JKZONET, JKZONEC) –One school in each zone is randomly assigned an indicator of 1 (0 for the other school) – JKREPS (JKREPT, JKREPC) –This indicator decides whether a school gets its replicate weight doubled or zeroed
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 18 A look inside the IDB Analyzer
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 19 Standard error: ^
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 20 Example using ICCS data Standard error of the teacher age SPSS just can‘t do that
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid SE and plausible values For ICCS achievement data, the standard error consists of two components Sampling error –this is what we just discussed Addtionally: measurement error –resulting from the use of plausible values This is the topic of the next presentation
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 22 Conclusion Use the sampling weights! Compute standard errors using the JRR!
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Thank you for your attention!