1 A workshop on using R to select a sample for EHES Susie Cooper & Johan Heldal Statistics Norway.

Slides:



Advertisements
Similar presentations
Multistage Sampling.
Advertisements

Data transfer to the EHES RC Luxembourg
HES Data Management Ari Haukijärvi. Planning of HES Data Management Purpose of the data management The data will be available for analysis The available.
Support provided by EHES Reference Centre Hanna Tolonen Susanna Conti Johan Heldal EHES JA Kick-off meeting, 21 January 2010, Luxembourg.
1 Cluster Sampling Module 3 Session 8. 2 Purpose of the session To demonstrate how a cluster sample is selected in practice To demonstrate how parameters.
Tim Hodson Re-imagining the virtual library CASE STUDY One:
Ch-11 Project Execution and Termination. System Testing This involves two different phases with two different outputs First phase is system test planning.
P3- Represent how data flows around a computer system
Welcome to the Plant Breeding and Genomics Webinar Series Today’s Presenter: Dr. Heather Merk Presentation & Supplemental Files:
Linux Boot Loaders. ♦ Overview A boot loader is a small program that exists in the system and loads the operating system into the system’s memory at system.
GLOBAL TOBACCO SURVEILLANCE SYSTEM Global Youth Tobacco Survey Training Workshop Introduction to the GYTS Sample Design & Weights.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
MCT260-Operating Systems I Operating Systems I Introduction to Operating Systems.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
1 Sampling for EHES Principles and Guidelines Johan Heldal & Susie Cooper Statistics Norway.
Near East Regional Workshop - Linking Population and Housing Censuses with Agricultural Censuses. Amman, Jordan, June 2012 Improving Efficiency.
RUP Implementation and Testing
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
Python File Handling. In all the programs you have made so far when program is closed all the data is lost, but what if you want to keep the data to use.
Secondary Storage GCSE Computing. Objectives… Understand the need for input and output devices. Describe suitable input devices for a wide range of computer.
Launch SpecE8 and React from GSS. You can use the chemical analyses in a GSS data sheet to set up and run SpecE8 and React calculations. Analysis → Launch…
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
SimArch: Work in Progress Multimedia Teaching Tool Faculty of Electronic Engineering University of Nis Serbia.
Interfaces to External EDA Tools Debussy Denali SWIFT™ Course 12.
Data Management Console Synonym Editor
Sampling Presentation on workshop in Luxembourg 10.April 2008 Johan Heldal.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Integrating QDEC with Slicer3 Click to add subtitle.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
1 Towards a common statistical enterprise architecture Ongoing process reengineering at Statistics Sweden Service Oriented Architecture – SOA Sharing of.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
The Variance of a Random Variable Lecture 35 Section Fri, Mar 26, 2004.
Introduction to CADStat. CADStat and R R is a powerful and free statistical package [
UFCFY5-30-1Multimedia Studio Scripting for Interactive Media Using Interface Fields to Receive and Display Data to the User.
OPERATING SYSTEMS (OS) By the end of this lesson you will be able to explain: 1. What an OS is 2. The relationship between the OS & application programs.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
With the support of the LPP programme of the European Union 1 This project has been funded with support from the European Commission. This publication.
Introduction to Data Manipulation, Analysis, and Visualization with R Patrick Grof-Tisza.
Block 1: Introduction to R
R Brown-Bag Seminar 2.1 Topic: Introduction to R Presenter: Faith Musili ICRAF-Geoscience Lab.
Lecture 2: Introduction to R
CCS Engineering Tools The tools are used help development and debugging of VLT SW control applications This presentation will provide a general view of.
DATA MINING Python.
Organizing national surveys
Guidelines for planning the costs of statistical surveys and other work implemented by the organisational units of official statistics services.
FRED A software tool for modern optical engineering
Today’s Beginner Workshop
Crash course in R – short introduction
IOTA HOW TO START BUILDING.
This is where R scripts will load
SUSE Linux Enterprise Desktop Administration
May 31-June 2, 2016, Missouri Botanical Garden
MIS2502: Data Analytics ICA #7 Introduction to R and RStudio - Recap
Using Script Files and Managing Data
This is where R scripts will load
This is where R scripts will load
Introduction to Matlab
Mapping Data Production Processes to the GSBPM
International Workshop on Population Projections using Census Data
Using R for Data Analysis and Data Visualization
Processing Devices.
A brief introduction to the nutrient tool-kit, getting R Studio to work and checking the data Martyn Kelly
Web Application Development Using PHP
Overview of Computer system
ME 123 Computer Applications I Lecture 7: Basic Functions 3/20/03
Presentation transcript:

1 A workshop on using R to select a sample for EHES Susie Cooper & Johan Heldal Statistics Norway

2 Overview What is R and why use it? Practical Exercises 1.Installing and loading R and packages 2.Reading external files 3.Calculating sample sizes 4.Stage 1 - Selecting Primary Sampling Units (PSU) 5.Stage 2 - Selecting Secondary Sampling Units (SSU) Where to get more information

3 Why use R for EHES? It has been agreed with EU because It’s free - therefore available for all countries involved. Very flexible Very powerful and fast tool for sampling and analyses. However… There can be a steep learning curve to using the program. No user-friendly interface.

4 What is EHESsampling? A tool for planning the sampling design Can be used to find good stratifications Can calculate cost-variance optimal sample sizes within PSUs. Can calculate costs and variances of alternatives. A tool for taking a probability sample from a sampling frame.

5 Using EHESsampling The EHESsampling manualmanual Before using EHESsampling you have to prepare some input datasets from the main sampling frame. For sampling at stage 1 you need A dataset describing the PSUsPSUs A dataset describing the stratastrata For stage 2 you need The main sampling frame describing the individual units

6 1. Loading Packages Load the EHESsampling package and other necessary packages each time you re-open R: library(EHESsampling)

7 2. Reading External Files Open a new script by selecting File and New script

8 2. Reading External Files Set the working directory where data files are stored by typing into the new script: setwd( " X:/120/EHES/R/Data " ) Then press + R to send the line to the console Location on your computer where the data files are stored

9 2. Reading External Files Read in the chosen file and save it in the working environment. PSUs.df<-read.table("post1000.csv", sep=";", dec=",", header=T) The file is now stored as PSUs.df for this session.

10 To see the start of the data set type: head(PSUs.df) 2. Reading External Files Print the first 6 lines of this

11 2. Reading External Files Rename PSUs.df variables to standard names names(PSUs.df)[c(1,2,3,4,13,14)]<-c("PSU", "name","strata","size","meanX","varX") These are the placements of the columns of names to change These are the names we are changing the chosen columns to. head(PSUs.df)

12 2. Reading External Files Read in the details for each stratum strataDetails.df<-read.table("Norwaystrata.csv", sep=";", dec=",",header=T) Rename the variables to standard ones names(strataDetails.df)[c(1,2)]<-c("strata","size")

13 2. Reading External Files Take a look at the dataset strataDetails.df

14 3. Calculating Sample Sizes Calculate the sample sizes for each PSU stage1<-sample.sizes(PSUs.df,strataDetails.df, n="n2",columns=5:12) This is the data frame with one line for each PSU This is the data frame with one line for each stratum This is the column of the strata data set containing strata sample sizes These are the columns containing the age/gender size information

15 3. Calculating Sample Sizes The sample.sizes function produces 2 datasets: stage1$pop and stage1$strat, which can be saved separately. stage1.strata.df<-stage1$strat stage1.pop.df<-stage1$pop

16 3. Calculating Sample Sizes Look at stage1.strata.df by typing the name into the console. stage1.strata.df

17 3. Calculating Sample Sizes Look at the top of stage1.pop.df by typing head and the name in brackets into the console. head(stage1.pop.df)

18 4. Stage 1 – Selecting PSUs Choose a sample with the correct number of PSUs (mk) from each stratum: stage1.select.df<-stage1.sample(stage1.pop.df) This is what the new data frame is called containing the selected PSUs This is the function we have created to select the PSUs This is the previously saved data frame containing the information for each PSU and age/gender domain

19 4. Stage 1 – Selecting PSUs Look at the chosen PSUs PSU.list(stage1.select.df)

20 4. Stage 1 – Selecting PSUs Export the file of chosen PSUs write.table(stage1.select.df,file="select.csv", sep=";", dec="," row.names=FALSE)

21 5. Stage 2 – Selecting SSUs Combine the file of selected PSU with the file containing individual unit data. This should result in a file of all individual units in all the selected PSUs.

22 5. Stage 2 – Selecting SSUs Read in the merged file: PSU.individuals.df<-read.table("NorwaySelected.csv", sep=";", dec=",", header=T) head(PSU.individuals.df)

23 5. Stage 2 – Selecting SSUs Take a sample of appropriate size in each stratum and PSU: selected.individuals<-stage2.sample(PSU.individuals.df) Look at the top of the selected individual units: head(selected.individuals)

24 Further Sampling Steps Read in the strata dataset Calculate the PSU sample sizes Take a sample of PSUs – stage 1 Merge the selected PSUs with the main sampling frame containing individual units. Sample individual units – stage 2

25 Selected Individuals

26 Help! EHESsampling manual available at: EHES participant manual – Part 1: Chapter 05 R websites: R official site: Quick R: Us: