SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Slides:



Advertisements
Similar presentations
The Wealth Index MICS3 Data Analysis and Report Writing Workshop.
Advertisements

1 Questionnaire design Module 3 Session 3. 2 Overview (of Session) This session starts by introducing some aspects that need to be considered when designing.
Collecting data for informed decision-making
Stratified Sampling Module 3 Session 6.
1 Adding a statistics package Module 2 Session 7.
1 Session 10 Sampling Weights: an appreciation. 2 To provide you with an overview of the role of sampling weights in estimating population parameters.
Housekeeping: Variable labels, value labels, calculations and recoding
SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics Tests for Variances (Session 11)
Estimation in Stratified Random Sampling
SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)
Correlation & the Coefficient of Determination
SADC Course in Statistics Samples and Populations (Session 02)
SADC Course in Statistics Sample size determinations (Session 11)
SADC Course in Statistics Sampling design using the Paddy game (Sessions 15&16)
SADC Course in Statistics Processing single and multiple variables Module I3 Sessions 6 and 7.
SADC Course in Statistics Assessing data critically Module B1 Session 17.
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
SADC Course in Statistics Exploratory Data Analysis (EDA) in the data analysis process Module B2 Session 13.
SADC Course in Statistics Graphical summaries for quantitative data Module I3: Sessions 2 and 3.
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics (Session 09)
SADC Course in Statistics General approaches to sample size determinations (Session 12)
SADC Course in Statistics To the Woods discussion (Sessions 10)
SADC Course in Statistics Introduction to the module and the sessions Module I4, Sessions 1 and 2.
SADC Course in Statistics Reporting on the web site Module I4, Sessions 14 and 15.
Using a statistics package to analyse survey data Module 2 Session 8.
SADC Course in Statistics Reviewing reports Module I4, Session 9.
SADC Course in Statistics Producing a product portfolio Module I3 Session
The MDGs and School Enrolment: An example of administrative data
SADC Course in Statistics Handling Data Module B2.
SADC Course in Statistics Objectives and analysis Module B2, Session 14.
SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5.
SADC Course in Statistics Risks and return periods Module I3 Sessions 8 and 9.
1 Table design Module 3 Session 2. 2 Objectives of this session By the end of this session, you will be able to: appreciate the different type of objectives.
SADC Course in Statistics The Swaziland area survey Choice Ginindza.
SADC Course in Statistics Analysing Data Module I3 Session 1.
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Probability Distributions
SADC Course in Statistics Good graphs & charts using Excel Module B2 Sessions 6 & 7.
SADC Course in Statistics Excel for statistics Module B2, Session 11.
SADC Course in Statistics Module B2, Session3
SADC Course in Statistics Exploratory Data Analysis for single variables Module B2 Session 12.
Maintaining data quality: fundamental steps
The Frequency Table or Frequency Distribution Table
Categorical variable We need a frequency table, preferably with values expressed as percentages, to summarise the values of the variable. We also need.
Chapter 11: The t Test for Two Related Samples
Here we add more independent variables to the regression.
SADC Course in Statistics Presenting good tables and graphs Module B2, Session 8.
Livelihoods analysis using SPSS. Why do we analyze livelihoods?  Food security analysis aims at informing geographical and socio-economic targeting 
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
SADC Course in Statistics Taking measurements Module I1, Session 17.
Statistics for Decision Making Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
SADC Course in Statistics Analysing numeric variables Module B2, Session 15.
Tables and graphs for frequencies and summary statistics
1 Multiple Regression Here we add more independent variables to the regression. In this section I focus on sections 13.1, 13.2 and 13.4.
SADC Course in Statistics Introduction to the module and the session Module I1, Session 1.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
SADC Course in Statistics Producing Good Tables In Excel Module B2 Sessions 4 & 5.
1 Organising data in a spreadsheet Module 1 Session 3.
Measures of Central Tendency
Chapter 13: Inference in Regression
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 1, Slide 1 Chapter 01 Stats Starts Here.
Copyright 2010, The World Bank Group. All Rights Reserved. Introduction to the System of National Accounts (SNA) Lesson 6 Capital Formation and Trade balance.
1 Statistical concepts Module 1, Session 2. 2 Objectives From this session participants will be able to: Define statistics Enter simple datasets once.
PROCESSING, ANALYSIS & INTERPRETATION OF DATA
Describing Distributions with Numbers Chapter 2. What we will do We are continuing our exploration of data. In the last chapter we graphically depicted.
Survey Training Pack Session 2 – Data Analysis Plan.
Presentation transcript:

SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16

Objectives of these three sessions You should be able to: Explain why weights are sometimes needed in analysing survey data Produce weighted tables of counts and other statistics Suggest ways of adjusting analyses when there are missing values Analyse multiple response data Cope with data containing zero values

Contents Review Why these sessions? There can be zero values That may have to be analysed separately Multiple responses are common and are an example of data at multiple levels Weights are often needed Because observations represent different fractions of the population Missing values can distort an analysis Simple options are explored

Review Describe data well can use Excel, or a statistics package we repeat briefly, with a statistics package Real data sets introduce surprises in analysis That are not present with artificial training exercises They need practice during training courses Or they will be a problem to analyse later But some complications are predictable And very common Like multiple response questions, or the need for weights These are the complications we cover here

How to describe data well – (repeat slide) Look for oddities in the data and be prepared to adapt the summaries that you calculate Study the data as tables and graphs Use frequencies and percentages to summarize categorical variables Use averages and measures of variability to summarize numeric variables Identify any structure in the data and use it in producing your summaries

Look at the data (repeat slide) The 2 types of variable are summarized in different ways

Analysis to meet objectives (repeat slide) Simple objectives Not so simple objectives

Meeting simple objectives (repeat slide) These summaries were made with Instat – see practical 1

Answering more complicated objectives AND explaining some of the variability These were also with Instat

Practicals 1 and 2 Practical 1 Reviews the construction of tables Using a statistics package Particularly to look at percentages Because percentages have to be understood clearly to analyse multiple response data Practical 2 Looks at the analysis of data containing zeros And shows that calculating averages needs to be done carefully, when there is structure in the data Both practicals give more practice In the use of a statistics package

Zero values Zeros may be are a simple part of the data For example: List the assets – radio, bicycle, etc Some may have zero assets Often however zero is a special value And should be analysed in a special way Examples: How many livestock do you have? What was your yield of maize? How much rain fell yesterday? What is different here?

Example Obs. Value Possible analysis Total = 30 n = 10 mean = 3 median = 2 etc This does nothing special The zeros are analysed with all the other values

Example continued Obs. Value Alternative analysis Total = 30 n = 10 number of zeros = 4 proportion of zeros = 0.4 (40%) n = 6 are non-zero mean = 5 of the non-zero values median = 5.5 etc

Which is better? As usual both are valid It depends on the precise objective And on the type of data Often the 2-step analysis is appropriate The data are split into 2 For example: Do you have cattle? Then (if you do) how many do you have? Analysis 60% of farmers owned cattle Among the cattle owners, the mean was 5 per household

Multiple response questions? From Tanzania agricultural survey These are NOT multiple responses because the question asks for the main source Ask for ALL sources used to make it multiple response

Multiple responses? Not multiple response Multiple response You may own more than 1 asset

Livestock survey examples

Analysis of multiple responses For individual species it is easy What % keep cattle? What % keep sheep? Nothing special needed Looking at all species together Needs thought what % keep livestock does livestock keeping depend on type of household

Practicals 3 and 4 Multiple response analysis Using a simple example With three different layouts of the data Then some real examples! Using data from the Tanzania agriculture survey

Introducing weights Suppose a sample of 2 farmers Farmer Yield A1 t/ha B2 t/ha What is the mean? Obviously it is (1 + 2)/2 = 1.5 t/ha! But…

Introducing weights - continued Suppose a sample of 2 farmers FarmerArea YieldProduction A 5 ha1 t/ha5 tons B 0.5 ha2 t/ha1 ton Now what is the mean? It could still be (1 + 2)/2 = 1.5 t/ha Or it could be (5 + 1)/5.5 = 1.1 t/ha

But which is right? They are both right, but they answer different questions Take food security Are you interested in the farmer Or the production Or both If the farmer is the unit of interest Then there are 2 farmers The mean is 1.5 If the area is the unit of interest Then there are 5.5 ha And Farmer A is 10 times as important as farmer B So a weighted mean is produced

The weighted mean So if the area is of interest – then with FarmerArea Yield A 5 ha1 t/ha B 0.5 ha2 t/ha Weight each yield by the area it represents mean = (1*5 + 2*0.5)/5.5 = 1.1 Here the areas are the weights They are used when different observations represent different proportions of the population

Weights in the Tanzania agriculture survey The number of people in the population represented by each observation It was roughly a 1% sample, so the weights are about 100 The technical guide explains the calculations

Practical 5 Weights using a statistics package First the rice survey Weighting by the size of field Then the Tanzania agriculture survey Investigate ownership of radios By sex of household head And then by type of farming household

Possession of radio by type of farming Unweighted analysis The observed numbers and percentages in the sample Look at livestock – but numbers small

Possession of radio by type of farming Weighted analysis The estimated numbers and percentages in the region of Tanzania Look at livestock now – what do you conclude?

Why such a large change with weighting? Examine the weights for these 2 groups Average weight = 60Average weight = 20 So estimated % with radio = 100*(42*20)/(10*60+42*20) = 59%

And always take care with small numbers Large sample overall But still a small sample of livestock-only farmers

Missing values Survey of countries on principles of official statistics Non-response is one form of missing value Here 82 of the 194 countries did not respond

More missing values This non-response is missing responses to questions within the 112 who responded overall

Practical 6: Non-response and missing values The data on the principles of official statistics are re-analysed in a new way Which adjusts for the missing values The countries who did not respond Then the missing values are considered Within the responses that were available

Coping with missing values They should be stated in the reporting Which they were in the report on the principles Can they be ignored? Often the missing values are simply ignored The analysis of the principles ignores them If their absence is uninformative Then ignoring them is usually OK Otherwise you could look to compensate We show one way here By using a weighted analysis The main message is to think carefully Dont be quick to let the computer impute values

Non-response in the Principles survey The adjustment may present a fairer picture Of the 194 countries But it adds a worrying component Would it be better to present the results separately For each type of country? And the 15 countries from the Least Developed group Have a large weight To compensate for those that are missing

Missing values within the data There are also a few missing values For example Principle 4 has only 11 responses Here there is much more information From the other responses from this country Possible actions are: 1.Do nothing That was how the results that were reported There are so few missing Any adjustment will make very little difference 2.Change the weights For the questions with missing values 3.Impute missing values Simply, or using special software

Can you now? Cope with data containing zero values Explain why weights are sometimes needed in analysing survey data Produce weighted tables of counts and other statistics Suggest ways of adjusting analyses when there are missing values Analyse multiple response data

The next sessions are to practice in groups all you have covered here so far