Download presentation
Presentation is loading. Please wait.
Published byIyana Price Modified over 10 years ago
1
ESDS meeting 9 th September 2005 1 P|E|A|S Practical Exemplars on the Analysis of Surveys –Web site to help people analyse surveys –Supported by the ESRC research methods programme –Authors Gillian Raab, Napier University Susan Purdon, National Centre for Social Research Kathy Buckner, Napier University Iona Waterston, Web designer http://www.napier.ac.uk/depts/fhls/peas
2
ESDS meeting 9 th September 2005 2 Summary of this presentation Background to the project –Our starting point and basic principles –Important concepts in survey design and analysis –Software for survey analysis –Approaches to missing data What we have learned from the project –Survey methods –Survey software –Missing data challenges Questions
3
ESDS meeting 9 th September 2005 3 Starting points (1) Survey data has special features that need to be considered in the analysis There is an enormous academic literature on survey analysis Universities in the UK have less expertise in survey analysis than in North America or Europe Most of the expertise lies in survey organisations
4
ESDS meeting 9 th September 2005 4 Starting points (2) The ESRC makes lots of data available via their survey archive – lots of it from Scotland –Scottish Health Survey –Scottish Household Survey This investment is to encourage use from e.g. University researchers –Government departments, – local authorities a – voluntary organisations But there is limited expertise on how best to analyse survey data
5
ESDS meeting 9 th September 2005 5 Starting points (3) Basic statistical theory for analysing sample surveys was developed from the 1950s to the 1970s –Cochrane, Kish, Rao The methods calculate confidence intervals and standard errors that take account of the survey design But none of this methodology has found its way into commonly used statistical packages until very recently –STATA – version 8? –SAS version 8 onwards –SPSS version 12 onwards –Splus/R survey packages last two years More recent methods are also available (especially in STATA and R)
6
ESDS meeting 9 th September 2005 6 Basic principles of what we present on P|E|A|S To illustrate how to use these new survey procedures effectively To help you to use them on your own data To use them to see how effective the design of the survey has been in getting accurate and precise estimates Like driving a car –We don ’ t expect you to understand all the details of how it works –But you do need to know the general principles –How to use the controls effectively –What regular checks you should be doing –What roads you should not be driving down
7
ESDS meeting 9 th September 2005 7 Survey features Based on current UK practice by ONS and survey organisations –Weighting –Clustering –Stratification Each of these has an impact on the results you get from analysing a survey. Only weighting will affect the estimates But all three will affect the standard errors
8
ESDS meeting 9 th September 2005 8 Weighting can make a large difference to answers Smoking rates from the 1998 Scottish Health Survey (ex3)
9
ESDS meeting 9 th September 2005 9 Weighting Why do we do it/ need to do it? –To make the sample match the population Because of selection as part of the design –Different sampling fractions in different areas –Selection of one adult per household To adjust for non-response How does it affect the precision of estimates? –It depends on both the weights and the data being analysed –It can help or hurt –If the weights are not related to the data being analysed then it will hurt to have unequal weights
10
ESDS meeting 9 th September 2005 10 Effect of weighting on standard errors (ex4) WERS 98 – a survey of workplaces run by the DWP Stratified by workplace size Sampling fractions much larger in strata of large workplaces This is often helps if we want to estimate something like the total numbers of employees with disabilities But for the proportion of workplaces with an equal opportunities policy it hurts
11
ESDS meeting 9 th September 2005 11 Stratification Divide up the sampling frame into strata (e.g. region, type of area) Take a sample of a fixed number of units from each stratum Stratification can be either proportionate or disproportionate Proportionate stratification means that the sample will match the population BETTER than would be expected by chance So proportionate stratification improves precision If it is disproportionate weights will be needed to estimate population totals Disproportionate stratification may help or hurt precision
12
ESDS meeting 9 th September 2005 12 Clustering Multi-stage designs very common in government surveys First a sample of clusters (e.g. post-code sectors) – stage 1 Then a sample of households within each cluster – stage 2 If clusters are selected with probability proportional to size and a fixed sample size id taken within each cluster, then no weighting is required Clustering almost always makes survey estimates less precise
13
ESDS meeting 9 th September 2005 13 Design effects (1) A design effect is a ratio that compares the precision of a survey with what would have been achieved from a simple, unclustered, unweighted, unstratified random sample of the same population. A large design effect is bad A design effect of 2 means that your effective sample size is only half of the responses you have achieved Means, proportions, differences between groups, regression coefficients, hazard ratios should all have design effects and chi-squared tests need adjustment by design effects Design effects are often quite different for subgroups of a sample – often not so bad And differences between groups are often very different from the overall mean – also often much better
14
ESDS meeting 9 th September 2005 14 Design effects (2) Many surveys publish tables of design effects or design factors for key variables, but rarely more than a page of them and almost never for things like differences, The design factor is just the square root of the design effect The idea is that you can just do an ordinary analysis and multiply your standard error by the design factor. This was for the pre-survey-software days On balance it probably gave standard errors that were too large for a lot of analyses, since people would try to play safe by taking the biggest design effect in the table. We don ’ t need to use design effects like that if we use thew correct software But they are a measure of how well the design has worked to get good answers
15
ESDS meeting 9 th September 2005 15 To summarise To get unbiased estimates need to use survey weights. To get correct standard errors need to take into account survey design, in particular weighting, clustering and stratification. We can now do this with standard software using survey methods Survey analysis software can also compare groups, carrying out regression analyses etc
16
ESDS meeting 9 th September 2005 16 Software for survey analysis You need a package that will allow for the survey design Specialist packages (SUDAAN, WESVAR) have been in use for many years STATA was the first general package with survey methods SAS, SPSS (add-on) and Splus/R all now Different ways of getting of describing the survey design And different capabilities in –Variety of methods –What feedback they give you about what you have done –Warning you when things are not going right Latest versions of all four packages will cover almost everything you would expect
17
ESDS meeting 9 th September 2005 17 Non-response An increasing problem for survey researchers From Alasdair Crockett :Weighting the Social Surveys (ESDS web site)
18
ESDS meeting 9 th September 2005 18 Two ways of dealing with it Post-stratification –Re-weighting the sample so as to match population totals –Gets a new set of weights Imputation –Fills in the missing values –Different procedures available –Used in censuses (one number) –And most often in longitudinal surveys
19
ESDS meeting 9 th September 2005 19 Post-stratification Only as good as the totals you are using for the population Will only correct non-response bias if the difference between responders and non-responders is explained by the post- stratification factors Census survey-link scheme informs us about this It has the potential to improve precision (see slide 12 if time) Survey firms and ONS are reluctant to use it because it may interrupt time series But post-stratification of old survey data is also a possibility Some survey packages will do it for you (R/Splus, STATA add on package for version 8, SAS Calmar macro) Analysing a survey to take account of post-stratification needs extra tricks (Splus/R and STATA provide them)
20
ESDS meeting 9 th September 2005 20 Imputation Most often carried out by census takers using detailed information from the census forms Usually picking up data from other similar individual households or household members More recently model based methods have become popular (books by Little and Rubin, Schafer are Bibles) –Very large literature on this now –And many sets of recommendations –e.g. make your imputation model large –Carry out multiple imputations and combine estimates – software to do this is available in Splus/R, STATA and SAS
21
ESDS meeting 9 th September 2005 21 Our experience Working with data from the Edinburgh Study of Youth Transitions and Crime (Exemplar 6) –It is tricky to get imputation models right for real data –Things can go horrible wrong especially if models are too big for the data –Its important to check things out –Choice of variables is more important than choice of models –We still have a lot to learn about this –Need to try these methods out on real data, not just simulated data
22
ESDS meeting 9 th September 2005 22 What we (I) have learned There is a lot more to know about survey design and analysis and new methods that need to be made available The literature still does not provide definitive answers to some questions But a lot of ground rules are well known Survey software is developing and improving fast It will do so even more if more people use it and feed back to the providers Non-response remains an important problem The jury is out as to whether and when post-stratification weighting, imputation or neither is the best approach to deal with non-response
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.