Teaching Survey Sampling Theory using R Michael D. Larsen George Washington University UseR 2010 poster session, 7/21/10 1.

Slides:



Advertisements
Similar presentations
Data: Quantitative (Histogram, Stem & Leaf, Boxplots) versus Categorical (Bar or Pie Chart) Boxplots: 5 Number Summary, IQR, Outliers???, Comparisons.
Advertisements

Session 1: Introduction to Complex Survey Design
Inferential Statistics and t - tests
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Complex Surveys Sunday, April 16, 2017.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Basic Statistical Concepts
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Formalizing the Concepts: Simple Random Sampling.
One-Way ANOVA Independent Samples. Basic Design Grouping variable with 2 or more levels Continuous dependent/criterion variable H  :  1 =  2 =... =
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Basics of Sampling Theory P = { x 1, x 2, ……, x N } where P = population x 1, x 2, ……, x N are real numbers Assuming x is a random variable; Mean/Average.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Correlation & Regression
How survey design affects analysis Susan Purdon Head of Survey Methods Unit National Centre for Social Research.
1 Terminating Statistical Analysis By Dr. Jason Merrick.
F-Test ( ANOVA ) & Two-Way ANOVA
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA.
Describing distributions with numbers
Formula Compute a standard deviation with the Raw-Score Method Previously learned the deviation formula Good to see “what's going on” Raw score formula.
Measurement Error.
Near East Regional Workshop - Linking Population and Housing Censuses with Agricultural Censuses. Amman, Jordan, June 2012 Improving Efficiency.
Review of Econ424 Fall –open book –understand the concepts –use them in real examples –Dec. 14, 8am-12pm, Plant Sciences 1129 –Vote Option 1(2)
Statistical Methods, part 1 Module 2: Latent Class Analysis of Survey Error Models for measurement errors Dan Hedlin Stockholm University November 2012.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
An Introduction to Visual Basic
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
Estimating Incremental Cost- Effectiveness Ratios from Cluster Randomized Intervention Trials M. Ashraf Chaudhary & M. Shoukri.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Describing distributions with numbers
AP STATISTICS LESSON 10 – 1 ( DAY 3 ) CHOOSING THE SAMPLE SIZE.
7.4 – Sampling Distribution Statistic: a numerical descriptive measure of a sample Parameter: a numerical descriptive measure of a population.
Fall Final Topics by “Notecard”.
Nick Smith, Kim Iles and Kurt Raynor Partly funded by BC Forest Science Program and Western Forest Products Sector sampling – some statistical properties.
Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =
for statistics based on multiple sources
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Statistics Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall.
Robust Estimators.
Standard Deviation. Warm-up Do girls study more than boys? We asked the students in three AP Statistics classes how many minutes they studied on a typical.
RESPONSE TIME CS3xx3 Human Computer Interaction. Response Time [Deborah J. Mayhew] Task Time = System Response Time + System Display Rate + User Scan/Read.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
PEP-PMMA Training Session Sampling design Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.
Data in context Chapter 1 of Data Basics. Frameworks Today, we will be presenting two frameworks for thinking about the content of data services. A.Statistics.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Math 4030 – 7b Normality Issues (Sec. 5.12) Properties of Normal? Is the sample data from a normal population (normality)? Transformation to make it Normal?
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson0-1 Supplement 2: Comparing the two estimators of population variance by simulations.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
ESTIMATING RATIOS OF MEANS IN SURVEY SAMPLING Olivia Smith March 3, 2016.
Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
1.3 Experimental Design. What is the goal of every statistical Study?  Collect data  Use data to make a decision If the process to collect data is flawed,
Survey Training Pack Session 18 – Checking Data Analysis.
© 1999 Prentice-Hall, Inc. Chap Measures of Central Location Mean, Median, Mode Measures of Variation Range, Variance and Standard Deviation Measures.
Nick Smith, Kim Iles and Kurt Raynor
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots
From GLM to HLM Working with Continuous Outcomes
1-Way Random Effects Model
Making Inferences about Slopes
Presentation transcript:

Teaching Survey Sampling Theory using R Michael D. Larsen George Washington University UseR 2010 poster session, 7/21/10 1

Uses of R in the course 1.Data analysis; exploring data 2.Programming complex formulas 3.Simulation of properties of estimators 4.Make estimation easier so one can think about concepts 2

Exploring data Examine means, sizes of clusters: more variability increases variance Examine means, sizes across strata: more variability decreases variance Examine skewness of variables: extreme skewness in population can lead to unrealistic sample-based estimates 3

Exploring Data: Tools Side-by-side boxplots – boxplot(split(senic$nurses, senic[,c("region","medical")]), xlab="four regions in U.S.; two hospital types", ylab="# nurses", main="113 hospitals in U.S.") Histograms Numerical summaries Correlations; regression sapply for lists created using ‘split’ command 4

Comparing two factors for stratification potential 5

Programming complex formulas Checks understanding of formulas Helps memorization of formulas Next page: two-stage cluster sample estimator for total and variance of total 6

7

Simulation Using functions, one can contrast complex estimation methods in terms of bias, variance and MSE Simulating 1,000 samples and plotting results gives different impression than mathematical result; Impact of skewness and outliers is more transparent 8

Ease of use Make estimation easier so one can think about concepts; Possible to focus on contrasts and more variables Students can do more ambitious projects and handle ‘real’ data 9

Ease of use example For a given budget and population, what is the advantage of more clusters with smaller sample sizes versus fewer clusters with bigger sample sizes? – Compute variances under three scenarios. – Take 10,000 samples under three different scenarios and compute variance of estimates. – Apply to three different variables. Write a summary. 10

Suggestions, part 1 Consistent syntax across survey designs Ease of use: be clear on what time of variables are needed – factor, numeric, etc. More examples with more numbers that can be replicated 11

Suggestions, part 2 Recover formula – not only R syntax but also an estimation formula – when run command More details in context when errors occur: – Your sample sizes for clusters ( ni ) exceed your population sizes for clusters ( Ni ). – Only one primary sampling unit (defined by psu ) is available for some clusters Include writing projects based on data analysis 12