Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of.

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Multilevel modelling short course
MARKETING RESEARCH Ing. Katarína Kleinová Department of marketing.
II. Potential Errors In Epidemiologic Studies Random Error Dr. Sherine Shawky.
Inferences based on TWO samples
Objectives 10.1 Simple linear regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
Secondary School Statistics The effect of NCEA and technology.
Business Statistics - QBM117 Revising interval estimation.
Introduction to Inference Estimating with Confidence Chapter 6.1.
Statistics. Overview 1. Confidence interval for the mean 2. Comparing means of 2 sampled populations (or treatments): t-test 3. Determining the strength.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
8 - 10: Intro to Statistical Inference
Inferences About Process Quality
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Business Statistics - QBM117 Revising interval estimation.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Determining the Size of
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Standard error of estimate & Confidence interval.
Correlation & Regression
Review for Exam 2 (Ch.6,7,8,12) Ch. 6 Sampling Distribution
Introduction to Statistical Inference Chapter 11 Announcement: Read chapter 12 to page 299.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHAPTER 11 DAY 1. Assumptions for Inference About a Mean  Our data are a simple random sample (SRS) of size n from the population.  Observations from.
AP STATISTICS LESSON COMPARING TWO PROPORTIONS.
Basic Sampling & Review of Statistics. Basic Sampling What is a sample?  Selection of a subset of elements from a larger group of objects Why use a sample?
Lecture 6 Forestry 3218 Forest Mensuration II Lecture 6 Double Sampling Cluster Sampling Sampling for Discrete Variables Avery and Burkhart, Chapter 3.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
Chapter 7 The Logic Of Sampling. Observation and Sampling Polls and other forms of social research rest on observations. The task of researchers is.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Academic Research Academic Research Dr Kishor Bhanushali M
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Section 7-3 Estimating a Population Mean: σ Known.
Data Collection & Sampling Dr. Guerette. Gathering Data Three ways a researcher collects data: Three ways a researcher collects data: By asking questions.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
Chapter 10 Sampling: Theories, Designs and Plans.
Correlation & Regression Analysis
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Module 25: Confidence Intervals and Hypothesis Tests for Variances for One Sample This module discusses confidence intervals and hypothesis tests.
Lab Chapter 9: Confidence Interval E370 Spring 2013.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
STATISTICS People sometimes use statistics to describe the results of an experiment or an investigation. This process is referred to as data analysis or.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
WELCOME TO BIOSTATISTICS! WELCOME TO BIOSTATISTICS! Course content.
MATH Section 7.2.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Stats Methods at IC Lecture 3: Regression.
Inference about the slope parameter and correlation
Topic 10 - Linear Regression
Virtual COMSATS Inferential Statistics Lecture-26
Meeting-6 SAMPLING DESIGN
Inferences and Conclusions from Data
Putting It All Together: Which Method Do I Use?
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Introduction to Econometrics
Simple Linear Regression
Product moment correlation
Chapter Outline Inferences About the Difference Between Two Population Means: s 1 and s 2 Known.
Comparing Two Proportions
Business Statistics - QBM117
Presentation transcript:

Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of Birmingham)

Size of data set The data set already contains some 3000k longitudinal records and increases by 600k a year. To carry out reasonably complex analyses, e.g. value added multilevel models, is already time consuming. Worth investigating the efficiency of sampling the database – either as a whole or for specific subpopulations such as LEAs. Traditional sampling theory can be used for simple statistics such as means or regression coefficients, and there is a literature for power calculations for multilevel models (see ESRC research project by Browne at Nottingham)

Special features of the NPD The population characteristics are known and can be used for drawing efficient samples. The possibility of an adaptive design exists, e.g.: –Select a random subsample to determine relationships of interest (equivalent of a pilot study) –Fit a suitable model to estimate parameter values –Choose parameters of interest together with their confidence intervals –Increase sample size to establish relationship between CI and sample size and extrapolate to sample size needed to achieve required interval size. –Any statistic of interest (in additon to CI) can be chosen.

Complex designs and replication For multilevel models and designs where interest focuses on special groups (e.g. low achievers) we need good choices of numbers of higher level units (schools) and numbers in the groups. A similar adaptive approach can be used, evaluating CIs or significance levels as design parameters are altered. We also have the opportunity of replicating an analysis by selecting an independent sample from the database.

Using all the data When analysing a given sample we will also generally have available data related to the sample members, e.g.: –School level averages for each pupil in a study –School level data for previous schools attended –School level data for previous years –LEA data for previous years –School data for neighbouring schools, All such data can be incorporated into a model, increasing the number of variables but not the sample size.

Other possibilities Poststratification: using population distributions to re-weight statistics or to incorporate weights in model estimation. Setting up an archive of results that may be useful for designing samples Using PLASC to select a research sample – subject to appropriate permissions.