Presentation is loading. Please wait.

Presentation is loading. Please wait.

As a data user, it is imperative that you understand how the data has been generated and processed…

Similar presentations


Presentation on theme: "As a data user, it is imperative that you understand how the data has been generated and processed…"— Presentation transcript:

1 As a data user, it is imperative that you understand how the data has been generated and processed…

2 This will help you understand the limitations of the data and the uses to which it can be put (and the confidence with which you can put it to those uses).

3 Ipsos MORI’s technical note  Available on the survey website  Contents: – Sampling methodology – Data collection – Data processing – Data weighting – Statistical reliability…

4 An intro to sampling  Not feasible to question all of the people in the population we are interested in i.e. all residents in a local authority area (this would be a ‘census’) Population… Sampling is making an inference about a… …from a……Sample

5 We use sampling in our everyday lives! Do you need to eat the whole pot to see if it is correctly seasoned? Or Will just tasting it be enough? TASTING IT - PROVIDED IT’S WELL STIRRED!

6 NHTS uses probability sampling…  That is every sampling unit (an address) has a known and non- zero probability of selection –we know the size of the population – the total no. of addresses – we are drawing from –we know the chance of selection that applies to every individual unit – an address – within the population (“1 in n”).  This does not apply to quota sample surveys which are the most commonly used approach in market research.  Random probability sampling is theoretically purer –it is less prone to non-response bias –all statistical reliability tests assume random sampling

7 Statistical reliability  A sample survey produces figures which are estimates of the ‘truth’: –that is, we are drawing inferences about the entire population based on a sample (the ‘true’ figure would be based on a survey of the entire population i.e. a census)  Statistical reliability is a statement about those estimates and the confidence we have about them in relation to the ‘truth’  Reliability is sometimes referred to as “confidence intervals”, “margins of error” or “sampling tolerances”  It is determined by: –the percentage –the sample size on which the percentage is based –the level of confidence we want to apply – it is usual to test at the 95% confidence level –the effect of any weights applied

8 Confidence intervals and sample size  Larger samples provide more accurate data  But to achieve double the reliability we need to quadruple the sample size  For example, the following figures are for a 50% finding: Sample size 50100200400 Confidence interval + 14%10%7%5%

9 Some examples  For an NHTS survey based on 400 responses and in relation to a finding that 50% are fairly satisfied with a service: –“We are 95% confident that our sample percentage is reliable to plus or minus 5%.”  Put another way: –“Out of every 100 surveys we conduct where we see these figures, in 95 of them we would be right and the true figure lies within that range, in 5 of them we would be wrong and the true figure would not lie in that range.” –“The chances are 95 in 100 that this result would not vary by more than 5 percentage points from the ‘true’ result, that which would be found had the entire population responded i.e. the ‘true’ result would be between 45% and 55%.”  Another example, based on 100 responses: –“We are 95% confident that our sample percentage is reliable to plus or minus 10%. The ‘true’ result would be between 40% and 60%.”

10 Statistical tests should not be used for sample sizes less than 50 – in fact, any estimates derived from small samples should be considered, at best, indicative.

11 A full list of confidence intervals for a range of sample sizes and percentages is provided in Ipsos MORI’s Technical note.

12 Statistical significance between %s  When we extend this to look at two figures (survey estimates), we are usually interested in the likelihood of the difference between the two figures being ‘real’: –In other words, how confident are we that the difference reflects what we would have found if we had surveyed the entire population?  The issue may apply either to two figures from one sample or to a comparison of figures in two different samples for example: –comparing the views of men and women –comparing the views of one authority’s residents with another's –comparing attitudes among one authorities’ residents and how they have changed between 2009 and 2010  The answer is obtained using a test of statistical significance

13 Statistical significance: some examples  For two samples based on 1,000 responses and with a difference of 5 percentages points (50% vs 55%) –The confidence interval is +4. –“We are 95% confident that this is a statistically significant finding.”  With the same %s based on 500 vs 500 responses –The confidence interval is +6. –“We are 95% confident that this is not a statistically significant finding”.  With the same %s based on 100 vs 100 responses –The confidence interval is +14. –“We are 95% confident that this is not a statistically significant finding”.

14 This means that there will be different confidence intervals involved in comparisons of survey %s for local authority A vs B, and comparisons of C vs D (assuming that A-D received different numbers of responses).

15 There can still be merit in reporting on, and using findings, which are not statistically significant but caution should be exercised the smaller the sample size. We cannot use these findings with the same degree of confidence.

16 A full list of confidence intervals for different sample size comparisons and percentages is provided in Ipsos MORI’s Technical note.


Download ppt "As a data user, it is imperative that you understand how the data has been generated and processed…"

Similar presentations


Ads by Google