The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Slides:



Advertisements
Similar presentations
Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?
Advertisements

Longitudinal LFS Catherine Barham and Paul Smith ONS.
Sampling A population is the total collection of units or elements you want to analyze. Whether the units you are talking about are residents of Nebraska,
An example of longitudinal LFS weights
1 STRATIFIED SAMPLING Stratification: The elements in the population are divided into layers/groups/ strata based on their values on one/several.
NLSCY – Elements to take into account. Objectives of the Presentation zEmphasize the key elements to consider of when using NLSCY data.
NLSCY – Suggestions for papers. Objectives of the Presentation zEmphasize proper ways to use the NLSCY data zIdentify the key factors we are looking at.
NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.
QBM117 Business Statistics Statistical Inference Sampling 1.
Chapter 1 The Where, Why, and How of Data Collection
Chapter 7 Sampling Distributions
Who and How And How to Mess It up
Non-Experimental designs: Developmental designs & Small-N designs
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics 10 th Edition.
11 Populations and Samples.
Chapter 1 The Where, Why, and How of Data Collection
A new sampling method: stratified sampling
213Sampling.pdf When one is attempting to study the variable of a population, whether the variable is qualitative or quantitative, there are two methods.
Sampling Methods.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Joint Canada/U.S. Health Survey Catherine Simile, National Center for Health Statistics Patrice Mathieu, Statistics Canada Ed Rama, Statistics Canada NCHS.
How survey design affects analysis Susan Purdon Head of Survey Methods Unit National Centre for Social Research.
Sample Design.
MATH1342 S08 – 7:00A-8:15A T/R BB218 SPRING 2014 Daryl Rupp.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Chapter 12 Sample Surveys
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Methods of Developmental Psychology I. Introduction A. The scientific method B. Research methods II. Dimensions of research method A. The normative-explanatory.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Instructor Resource Chapter 9 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling and Sampling Distributions.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait,
Part III – Gathering Data
Improving of Household Sample Surveys Data Quality on Base of Statistical Matching Approaches Ganna Tereshchenko Institute for Demography and Social Research,
Introduction to Survey Sampling
Statistics Canada Citizenship and Immigration Canada Methodological issues.
 When every unit of the population is examined. This is known as Census method.  On the other hand when a small group selected as representatives of.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
SAMPLING Why sample? Practical consideration – limited budget, convenience, simplicity. Generalizability –representativeness, desire to establish the broadest.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Presented by: Khaleel S. Hussaini PhD Bureau Chief, Public Health Statistics Division of Public Health Preparedness Judy Bass Arizona’s BRFSS Coordinator.
RESEARCH METHODS Lecture 28. TYPES OF PROBABILITY SAMPLING Requires more work than nonrandom sampling. Researcher must identify sampling elements. Necessary.
Sampling Dr Hidayathulla Shaikh. Contents At the end of lecture student should know  Why sampling is done  Terminologies involved  Different Sampling.
2.1 Developing a Thesis 2.2 Characteristics of Data.
Unit 2 Review. Developing a Thesis A thesis is a question or statement that the research will answer When writing a thesis, ask: Is it specific? Are the.
Last lecture summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures.
Research Design
Section 4.2 Random Sampling.
Part III – Gathering Data
The Language of Sampling
Graduate School of Business Leadership
Meeting-6 SAMPLING DESIGN
CHAPTER 12 Sample Surveys.
Introductory Econometrics
2. Stratified Random Sampling.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Designing Samples Section 5.1.
Salah Merad Methodology Division, ONS
Presentation transcript:

The obsession with weight in the modelling world And it’s ancillary affects on Analysis

The basic The basic idea of sampling The basic idea of sampling The reason behind complicating a good idea The reason behind complicating a good idea The implication when modelling data The implication when modelling data

How Sampling Works. 1% Random (systematic) 3% Random 5% Random 10% Random 2.5% Stratified Now let’s assume that we had some idea about the picture we wanted to see. And we decide to stratify the sample. In this case we decide to sample different areas of the picture at different rates, the backgroud, the dress, the face, the hands, etc... Imagine a well known picture Since a picture is made up of points of colour (pixels), we will sample the points of colour at different rates.

How Sampling Works. 3% 5%10% 2.5% Stratified 1%

How does this affect modeling or analysis The sample is no longer simply random The sample is no longer simply random We purposefully biaised the sample to gain efficiencies to meet other goals We purposefully biaised the sample to gain efficiencies to meet other goals This bias is corrected when we apply the design weights. This bias is corrected when we apply the design weights.

If you were to analyse each stratum separately Each part can actually be treated as surveys each with a simpler design The sampling frame or design allows you to keep all these part together in a cohesive way for analysis.Framework Still there would be some difficulty associated with the correction for non-response and final callibration (post)

The way we sample is reflected and corrected by how we weight the data in the end. How to interpret sampling If you looked only at the parts we sampled If you looked only at the parts we sampled –You wouldn’t get an accurate picture. –All the parts would be there but not in the right proportions. The design weights compensate for the known distortions. The final weights include estimated distortions. The design weights compensate for the known distortions. The final weights include estimated distortions.

What would you use to base the fundamental multivariate relationships in your model or analysis ?

Steps to calculate the weights – Basic overview At the survey design stage, some factors are used to determine the sample size required At the survey design stage, some factors are used to determine the sample size required Probability of selection calculated Probability of selection calculated First series of adjustments for non- response First series of adjustments for non- response Post-stratification Post-stratification

Factors to determine the sample size Characteristics to be estimated (small proportions) Characteristics to be estimated (small proportions) Required precision of the estimates (targetted CV) Required precision of the estimates (targetted CV) Variability of the data Variability of the data Expected non-response rate Expected non-response rate Size of the population Size of the population

Original design weight Once the sample is selected in each stratum, calculate the original weight: Once the sample is selected in each stratum, calculate the original weight: –N h /n h, where « h » is the stratum Since the sample is selected from LFS, get original weight from LFS. Since the sample is selected from LFS, get original weight from LFS. –Adjustments for the number of available children.

Non-response adjustment Adjustments must be made to take into account the total non-response Adjustments must be made to take into account the total non-response Characteristics of respondents vs non- respondents are analyzed: Characteristics of respondents vs non- respondents are analyzed: –Province, income, level of education of parents, depression scale of PMK, urban/rural, etc.

Post-stratification Adjustment factor calculated in order to post-stratify the sample to known population counts, by: Adjustment factor calculated in order to post-stratify the sample to known population counts, by: –Province, age, gender

Final weight W f = W i X Adj 1 X Adj 2 W f = W i X Adj 1 X Adj 2 Where Where –W f : Final weight –W i : initial weight –Adj 1 : Non-response adjustment –Adj 2 : Post stratification

Link between analysis and the sample design (weight) Child’s Ability Child’s Ability IntelligenceIntelligence Social environment SchoolSchool TeachersTeachers MaterialsMaterials CurriculumCurriculum Grade level SubjectSubject ProvinceProvince Province is a stratum The proportion of kids in the sample being taught the PEI curriculum is much larger than what’s found in the population

Link between analysis and the sample design There are very few things in a child’s life that is not related to where they live. In the city versus in a small village In a small province versus a large one what social/educational programs are offered what social support and services are offered regional cultural differences to name a few…

Weights for cycle 4 –Cross-sectional weights –Longitudinal weights, including the converted respondents. –Longitudinal weights, children introduced in C1 and respondent to all cycles. NEW –Not to mention the bootstrap weights, which are used for an entirely different purpose.

Cross-sectional Weights Available for all cycles, up to Cycle 4. Available for all cycles, up to Cycle 4. When are they used? When are they used? Cycle 4 cross-sectional weights: Cycle 4 cross-sectional weights: –to represent the population aged 0-17 in –…–…–…–… Cycle 1 weights: Cycle 1 weights: –to represent the population aged 0-11 in

Cross-sectional Weights - Cycle 4 - Warning In Cycle 4, children with a cross- sectional weight come from 4 different cohorts (introduced in 1994, 1996, 1998 and 2000). In Cycle 4, children with a cross- sectional weight come from 4 different cohorts (introduced in 1994, 1996, 1998 and 2000). By 2000, the 1994 cohort has been around for 6 years: By 2000, the 1994 cohort has been around for 6 years: –cross-sectional representativity decreases over time because of sample erosion and population change (immigration).

Cross-sectional Weights - Cycle 5 For Cycle 5 ( ), no children aged 6 and 7. For Cycle 5 ( ), no children aged 6 and 7. In addition, the 1994 cohort’s cross- sectional representativity has declined even further (erosion and immigration). In addition, the 1994 cohort’s cross- sectional representativity has declined even further (erosion and immigration). As a result, cross-sectional weights will be calculated only for children aged 0-5. As a result, cross-sectional weights will be calculated only for children aged 0-5.

Cross-sectional weights in a nutshell Cross-sectional weights must be used when the analysis concerns a specific year, when you want a snapshot of the situation at a specific point in time. Cross-sectional weights must be used when the analysis concerns a specific year, when you want a snapshot of the situation at a specific point in time.

Longitudinal Weights Longitudinal weights represent the population of children at the time they were brought in to the survey. Longitudinal weights represent the population of children at the time they were brought in to the survey. –Children introduced in Cycle 1: longitudinal weights represent the population of children aged 0-11 in

Longitudinal Weights (continued) –Children introduced in Cycle 2: longitudinal weights represent the population of children aged 0-1 in –Children introduced in Cycle 3: longitudinal weights represent the population of children aged 0-1 in –Children introduced in Cycle 4: longitudinal weights represent the population of children aged 0-1 in

When are longitudinal weights used? When you want to track a cohort of children introduced in a particular cycle and see how they’ve developed over time. When you want to track a cohort of children introduced in a particular cycle and see how they’ve developed over time.

Longitudinal Weights - Cycle 4 Something new in Cycle 4: Something new in Cycle 4: 2 sets of longitudinal weights: 2 sets of longitudinal weights: –Set 1: Weights for children who responded in their first cycle and in Cycle 4 (possible non-response in Cycle 2 or 3) –Set 2: Weights for those introduced in cycle 1 who responded in every cycle. NEW.

Longitudinal Weights - Cycle 4 Difference between the 2 sets of longitudinal weights Difference between the 2 sets of longitudinal weights –To avoid total non-response in Cycle 2 or 3, the set of weights for those who responded throughout can be used. –If you’re only interested in the changes between Cycle 1 and Cycle 4 directly, the longitudinal weights including converted respondents can be used.

Examples Following are real examples taken from the NLSCY data Following are real examples taken from the NLSCY data

Weighting - Examples Average weights in Cycle 4. 5-year-old 5-year-old 7 1-year-olds Prince Edward Island

15-year-old Weighting - Examples Average weights in Cycle 4 (continued) Ontario Ontario year-olds

Example: Proportion of children aged 0-17, by province, Cycle 4, UNWEIGHTED 24% of Canada’s children live in the Maritime provinces … whereas in reality... 24% of Canada’s children live in the Maritime provinces … whereas in reality...

Example: Proportion of children aged 0-17, by province, Cycle 4, WEIGHTED Whereas in reality…7.3% of children live in the Maritime provinces. Whereas in reality…7.3% of children live in the Maritime provinces.

Number of children aged 0-15 by year of age, Quebec, Cycle 3, unweighted The conclusion is obvious… The conclusion is obvious… Huge increase in births in 1993 and 1997!!!!! Huge increase in births in 1993 and 1997!!!!! AgeBirth Year Sample size Percentage % , % % % % , % % % % % % % % % % % Total6,295

Number of children aged 0-15 by year of age, Quebec, Cycle 3, WEIGHTED So much for the pseudo baby boom... So much for the pseudo baby boom... AgeBirth Year Population Percentage ,2545.2% ,7695.5% ,7136.0% ,6626.2% ,8956.2% ,4666.4% ,1016.7% ,8825.6% ,7528.2% ,4515.2% ,8197.6% ,1305.3% ,2026.9% ,4005.6% ,3857.1% ,2056.4% Total1,421,086

Conclusion To be obsessed with weights is a good thing…where statistical analysis is concerned To be obsessed with weights is a good thing…where statistical analysis is concerned