Working with the ECLS-B Datasets Weights and other issues.

Slides:



Advertisements
Similar presentations
Calculation of Sampling Errors MICS3 Regional Workshop on Data Archiving and Dissemination Alexandria, Egypt 3-7 March, 2007.
Advertisements

Calculation of Sampling Errors MICS3 Data Analysis and Report Writing Workshop.
Working with the ECLS-K Datasets Weights and other issues.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Estimates and sampling errors for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
QUANTITATIVE DATA ANALYSIS
Clustered or Multilevel Data
5-3 Inference on the Means of Two Populations, Variances Unknown
Analysis of National Health Interview Survey Data
Understanding Research Results
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Birth Cohort Jennifer Park National Center for Education Statistics Institute of Education Sciences IES Research Conference June 2006.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Tobacco Use Supplement to the Current Population Survey User’s Workshop June 9, 2009 Tips and Tricks Analyzing TUS-CPS Data Lloyd Hicks Westat.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
LECTURE 3 SAMPLING THEORY EPSY 640 Texas A&M University.
JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
MEPS WORKSHOP Household Component Survey Estimation Issues Household Component Survey Estimation Issues Steve Machlin, Agency for Healthcare Research and.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Analytical Example Using NHIS Data Files John R. Pleis.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Statistical Weights and Methods for Analyzing HINTS Data HINTS Data Users Conference January 21, 2005 William W. Davis, Ph.D. Richard P. Moser, Ph.D. National.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
Appropriate use of Design Effects and Sample Weights in Complex Health Survey Data: A Review of Articles Published using Data from Add Health, MTF, and.
1 ANALYZING DATA FROM THE NATIONAL IMMUNIZATION SURVEY __________________________________________ Michael P. Battaglia Abt Associates Inc. Meena Khare.
Weighting and imputation PHC 6716 July 13, 2011 Chris McCarty.
Methods of Presenting and Interpreting Information Class 9.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
AERA Data Training Seminar
How Can High School Counseling Shape Students’ Postsecondary Attendance? Exploring the Relationship between High School Counseling and Students’ Subsequent.
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
Chapter 7. Classification and Prediction
Peter Linde, Interviewservice Statistics Denmark
Using Stata to Analyze Complex Survey Data
Inference and Tests of Hypotheses
Medical Expenditure Panel Survey
Trena M. Ezzati-Rice, Frederick Rohde, Robert Baskin
Distribution of the Sample Means
Analysis of Complex Sample Data
Introduction to secondary analysis of complex survey data
Introduction to Survey Data Analysis
Using Weights in the Analysis of Survey Data
Presented: 2009 Canadian Users Stata Group Meeting
Complex Surveys
Deanna Kruszon-Moran, MS
Information from Samples
The t distribution and the independent sample t-test
Melanie Dove, MPH, ScD UC Davis
Chapter 1 The Where, Why, and How of Data Collection
Chapter 1 The Where, Why, and How of Data Collection
Using Weights in the Analysis of Survey Data
Chapter 5: Producing Data
15.1 The Role of Statistics in the Research Process
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
The Where, Why, and How of Data Collection
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter 1 The Where, Why, and How of Data Collection
Presentation transcript:

Working with the ECLS-B Datasets Weights and other issues. Information is courtesy of the Institute of Educational Sciences, National Center for Education Statistics and is used in their training seminars.

Sampling Weights What are sampling weights and why are they important? How are weights used? What weights are on the ECLS-B data files and when should they be used?

What is a “Weight” ? A weight is used to indicate the relative strength of an observation. In the simplest case, each observation is counted equally. For example, if we have five observations, and wish to calculate the mean, we just add up the values and divide by 5.

How are Weights Used? Dataset with 5 cases. Value 4 2 1 5 2 Sample mean (4+2+1+5+2) = 2.8 Weighted mean (4*1) + (2*2) + (1*4) + (5*1) + (2*2)/sum of weights = (4 + 4 + 4 + 5 + 4)/10 = 2.1

What is the Difference Between Weighted and Unweighted Data? With unweighted data, each case is counted equally. Unweighted data represent only those in the sample who provide data. With weighted data, each case is counted relative to its representation in the population. Weights allow analyses that represent the target population.

ECLS-B and Weights The ECLS-B is a sample, i.e. the entire population was not surveyed. The ECLS-B is not a simple random sample (SRS). That is, not all children had an equal probability of selection. Not all sampled children, mothers, and fathers participated.

Why Use Weights in the ECLS-B? Weights compensate for collecting data from a sample rather than the entire population and for using a complex sample design. Weights adjust for differential selection probabilities and reduce bias associated with non-response by adjusting for differential nonresponse. Weights are used when estimating characteristics of the population.

Examples of Weighted vs. Unweighted Data Race/Ethnicity Unweighted % Weighted % White, non-Hispanic 42 54 Black, non-Hispanic 16 14 Hispanic 21 26 Asian/Pacific Islander 12 3 Native American < 1 Multiple race, non-Hispanic 7 4

Examples of Weighted vs. Unweighted Data Birth Weight Characteristics Unweighted % Weighted % Average, in pounds 6.46 7.31 Normal birth weight 74 93 Moderately low birth weight 15 6 Very low birth weight 11 1

2 Year Weights (No father data) Use W2R0 if using 2-year parent interview data (with or without 9-month parent interview data. Use W2C0 if using 2 year child assessment data (with or without 9-month child assessment data) with or without any parent interview data.

2 Year Weights (with father data) Use W2F0 if using 2-year father data AND 9-month father data (resident and/or non-resident) with or without any parent interview data. Use W2FC0 if using 2 year father data, resident and/or non-resident (with or without 9-month father data), with at least one round of child assessment data, with or without any parent interview data.

2 Year Weights (with father data) Use W22F0 if using 2-year father data (without 9-month father data) with or without any parent interview data. Use W2C1F0 if using 9-month father data in combination with any parent interview data and any child assessment data.

2 Year Weights (with CCP data) Use W2C2J0 if using 2-year Child Care Provider (CCP) data, with at least one round of child assessment data, with or without any parent interview data. Use W22J0 if using 2-year Child Care Provider (CCP) data alone, or with any parent interview data.

2 Year Weights (with CCO data) Use W2C2P0 if using 2-year Child Care Observation (CCO) data, with at least one round of child assessment data, with or without any parent interview data or CCP data. Use W22P0 if using 2-year Child Care Observation (CCO) data alone, or with any parent interview data or CCP data.

How to Use Weights In SAS, use the “WEIGHT” statement. In SPSS, use the “WEIGHT BY” statement. Key Fact: All ECLS-B weights sum to population totals.

Weights in SAS SAS uses the WEIGHT statement in various PROCedures. PROC FREQ data = test; Tables Age Gender Score; Weight weightvar; Run;

Weights in SPSS LIST VARIABLES = age to weightvar. Frequencies variables = age, score /sta=default. weight by weightvar. frequencies variables = age, score /sta=default.

Weights in STATA clear use “c:\temp\test1.dta" tabulate score age gender [pweight=weightvar]

Weights for HLM Users HLM users can use the base weight, W1BASEWT, for longitudinal analyses where occasions are nested within children. Twins can be nested within families in a two-level model. Each twin has their own weight to be used at level one.

Summary about Weights Weights should be used when analyzing data from the ECLS-B. Parent and father weights sum to the number of children, not the number of parents or fathers. The appropriate weight should be selected based on the research question (which determines the data source).

Variance, Calculating Standard Errors Why are standard errors important? Why not use standard errors that assume a simple random sample (SRS)? How to use “exact” methods for estimating standard errors. How to use approximation methods for estimating standard errors.

Why are Standard Errors Important? Standard errors are produced for estimates from sample surveys. They are a measure of the variance in the estimates associated with the selected sample being one of many possible samples. Standard errors are used to test hypotheses and to study group differences. Using inaccurate standard errors can lead to identification of statistically significant results where none are present and vice versa.

Important Considerations The ECLS-B sample is only one of many samples of infants born in 2001 that could have been selected. The ECLS-B has a complex sample design and is not a simple random sample. All weights on the ECLS-B data files sum to population total and not sample totals.

The ECLS-B Sample Design The ECLS-B has a complex sample design. Children are clustered within primary sample units (PSUs). Low birth weight and very low birth weight children are oversampled. Chinese American, Asian American, and American Indian/Native Alaskan children are oversampled. Twins are oversampled.

The ECLS-B Sample Design: Clustering Sample children were clustered within primary sampling units (PSUs) to reduce field costs. Children were in closer geographical proximity than would occur in a simple random sample. Children in a clustered sample tend to be more alike than those in a simple random sample.

Complex Samples and Standard Errors The usual standard error formula assumes a simple random sample. Standard errors for estimates from a complex sample must account for the within cluster/across cluster variation. Special software can make the adjustment, or this adjustment can be approximated using the design effect.

Options Exact Methods such as the TAYLOR series and REPLICATION techniques. Approximation Method

Exact Methods Taylor series Extract PSU and strata Ids from data file. Software available: SUDAAN, STATA (using SVY commands), and SAS (using PROC SURVEY commands).

Exact Methods Replication Techniques Extract replication weights (90 of them). ECLS-B replication weights use jackknife 2 (JK2) methods. Software: WESVAR replication series (JK2), AM (JK2), and SAS callable SUDAAN.

Approximation Method Two stages: First, normalize weights so standard error is based on actual sample size rather than population size. Then, use design effect (DEFF) to account for complex sampling design.

1) Normalizing Weights * Weights on the ECLS-B sum to the population totals. Calculate a new weight that sums to the sample size. Normalized weights = (ECLS-B weight) * (sample n/population N). *SAS users do not need this step since estimates are produced based on the actual sample size.

Example – Normalizing Weights Weight to be normalized: W1R0 Sum of weights: 3,997,169 Total number of cases with a positive weight: 10,688 Normalized weight = W1R0 * (10,688 / 3,997,169)

2) Adjusting for Complex Design The ECLS-B has a complex sample design; it is not a simple random sample. Software packages designed for simple random samples tend to underestimate the standard errors for complex sample designs. Special methods are required for complex designs.

Using Design Effects (DEFF) What is a design effect (DEFF)? It’s the ratio of the variance found in actual (complex) sample design to the variance expected in a simple random sample of the same sample size.

Using Design Effects (DEFF) DEFT = the square root of DEFF = (Design standard error/ simple random sample error). Example for X1NCATS = Total Child Score SE (SRS) = 0.029 SE (Design) = 0.048 DEFF = 0.0482/ 0.0292 = 2.707 DEFT = 0.048/ 0.029 = square root of 2.707 = 1.645

3 Ways of Using the DEFF Multiply the SRS (simple random sample) standard error produced by statistical software (when using normalized weights) by the square root of the DEFF (DEFT). Or Adjust the t-statistic by dividing it by the square root of the design effect (DEFT) or adjust the F-statistic by dividing it by the DEFF. Adjust the weight such that an adjusted standard error is produced.

Using a DEFF- Adjusted Weight First step, create a weight that sums to the sample size (normalized weight. Second, divide this normalized weight by the DEFF. Third, use this weight for analyses. The standard errors produced will approximate the standard errors obtained using “exact” methods.

Where to find ECLS-B DEFF’s Training material: “ECLS-B Specifications for Computing Standard Errors” 9-month User’s Manual: Exhibit 4-5 and table 4-6. 2 year User’s Manual: Exhibit 4-10 and tables 4-10 and 4-26.

For SAS Users SAS base procedures such as PROC REG, PROC FREQ, and PROC MEANS do account for the actual sample size but not for complex sampling. SAS procedures such as PROC SURVEYMEAN and PROC SURVEYREG (and other procedures that begin with “Survey”) use the Taylor series method to account for complex sampling and provide exact estimates of the standard errors.

PROC SURVEYREG Example Example using ECLS-K data, spring kindergarten and spring first grade variables. proc surveyreg data = fscores; model c4r3mscl = c2r3mscl lowkread t4learn; cluster c24cstr; strata c24cpsu; weight c24cw0; where lowkmath = 0; run;

PROC SURVEYLOGISTIC Example Example using ECLS-K data, spring kindergarten and spring first grade variables. proc surveylogistic data = fscores; model lowkread (desc) = c2r3mscl t4learn; cluster c24cstr; strata c24cpsu; weight c24cw0; where lowkmath = 0; run;

PROC SURVEYFREQ Example Example using ECLS-K data, spring kindergarten and spring first grade variables. proc surveyfreq data = fscores; tables lowkread c2r3mscl t4learn; cluster c24cstr; strata c24cpsu; weight c24cw0; run;

STATA Code for Complex Design Logistic Regression Example, 3rd Grade Data Svyset [pweight=C5CW0], strata (C5TCWSTR) psu (C5CWPSU) Svy, subpop (male) : logit highbmi white

STATA Code for Complex Design Regression Example, 3rd Grade Data Svyset [pweight=C5CW0], strata (C5TCWSTR) psu (C5CWPSU) Svy, subpop (male) : reg highbmi white

STATA Code for Complex Design Means Example, 3rd Grade Data Svyset [pweight=C5CW0], strata (C5TCWSTR) psu (C5CWPSU) Svy, subpop (male) : mean highbmi female

SPSS for Complex Sample Design Use add-on to SPSS called, SPSS Complex Samples™ Complex Samples Logistic Regression (CSLOGISTIC)—Performs binary logistic regression analysis, as well as multiple logistic regression (MLR) analysis, for samples drawn by complex sampling methods. The procedure estimates variances by taking into account the sample design used to select the sample, including equal probability and PPS methods, and WR and WOR sampling procedures. Optionally, CSLOGISTIC performs analyses for subpopulations. Courtesy of SPSS

Regression Analysis Use appropriate software such as AM, WESVAR, SUDAAN or SAS (SURVEYREG procedure). For SAS (PROC REG procedure), use DEFF-adjusted weights. For SPSS, use normalized, DEFF-adjusted weights.

Summary All statistical tests should be based on standard errors that are calculated to account for the complex sample design of the ECLS-B. Preferred: Use software that incorporates JK2 replication methods, or Use software that incorporates Taylor series method, or Last resort: Make approximate adjustments based on design effects.

ECLS-B Data Availability 4 Year (Pre-School) now available. Kindergarten wave will be split into 2 groups, as the children of the ECLS-B enter kindergarten over a 2 year period.