Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 1 Some basic statistical concepts, statistics.

Similar presentations


Presentation on theme: "University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 1 Some basic statistical concepts, statistics."— Presentation transcript:

1 University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 1 Some basic statistical concepts, statistics and distributions Parameters and statistics Parametric versus non-parametric statistics Properties of statistics Some useful statistics The normal distribution The Student’s t distribution Confidence intervals for sample statistics Statistical power and experimental design

2 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 2 Concepts map

3 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 3 Parameters, statistics and estimators parameters characterize populations (which in general cannot be completely enumerated) statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) The process by which one obtains an estimate of a population parameter from a finite sample is called an estimation procedure. Population Sample

4 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 4 Parametric statistical analysis Estimating model parameters based on a finite sample and inferring from these estimates the values of the corresponding population parameters Therefore, parametric analysis requires relatively restrictive assumptions about the relationships between the sample and the population, i.e. about the distributions from which samples are drawn and the nature of the drawing (e.g., normal distributions and random sampling) X Y Sample Population Inference X

5 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 5 Non-parametric statistical analysis Calculation of model parameters based on a finite sample, but no inference to corresponding population parameters Therefore, non-parametric analysis requires relatively minimal assumptions about the relationships between the sample and the population (e.g. normal distributions of sampled variables not required) 

6 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 6 Properties of statistics Accuracy: an accurate statistic is one for which its value, averaged over samples from the same population, is “close” to the true population parameter. Sample Population X X Less accurate statistic More accurate statistic

7 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 7 Properties of statistics Precision: a precise statistic varies little among samples drawn from the same population. Sample Population X X Less precise statistic More precise statistic

8 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 8 Properties of statistics Consistency: the more consistent a statistic is, the faster it approaches the true population value as sample size increases. Sample Population X Less consistent More consistent X Sample size (N)

9 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 9 A comparison of some well-known statistics Frequency Range

10 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 10 Statistics: measures of central tendency mean: is easy to calculate and has a predictable distribution, but can be strongly influenced by outliers median (M): the value of a measured variable that has an equal number of observations both smaller and larger, and is less sensitive to outliers than the mean X Frequency M

11 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 11 Statistics of dispersion: the range range: defined by largest and smallest values in the sample It is a simple statistic, but is biased because it consistently underestimates the population (parametric) range. Frequency Population range Sample range

12 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 12 Dispersion Three frequency distributions with identical means and sample sizes but different dispersion patterns

13 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 13 Dispersion statistics: variance, standard deviation and the coefficient of variation Variance: average squared deviation from the mean Standard deviation: square root of the variance Coefficient of variation: standard deviation divided by the sample mean X 100

14 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 14 The normal distribution symmetric, bell-shaped distribution characterized by 2 parameters: (1) the mean  and (2) the variance  2 Probability X

15 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 15 The standard normal distribution obtained by scaling the distribution by converting observed values to standard normal deviates (Z- scores) resulting distribution has  = 0,  2 = 1 Probability -3-2 0123 Z Scaled (Z-transformed) Unscaled

16 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 16 The standard normal distribution 68% of the population within 1  of the mean 96% within 2  of the mean Z Probability -3-2 0123  ± 1   ± 2 

17 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 17 Confidence intervals for observations the range of values in which X% of the observations from a population are expected to fall generally centred on the mean: for a normal population  ± Z  95.5% CI is  ± 2  but  and  are seldom known....

18 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 18 Confidence intervals for observations: estimation problems Replacing  and  by their sample estimates can lead to serious biases. Simulation: sample standard normal population and for each sample, calculate sample mean and variance. Then calculate CI based on sample mean and variance, and see what proportion of the true population fall within the CIs. Average 5% Proportion (%) of the population outide 95% CI N =1000 0 100 200 300 400 500 020406080100 Mean = 5% Number of trials

19 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 19 Confidence intervals for observations: estimation problems When sample size is large, estimated CIs are very close to true CIs. However, when sample size is small, estimated CIs are far too small.

20 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 20 Confidence intervals for observations: estimation problems Estimated CIs based on Z-scores approach true CIs as sample size increases, but, for small N, are highly biased (i.e. are smaller than they should be).

21 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 21 The Student’s t-distribution distribution of difference between sample mean and population mean divided by the standard error of the mean converges towards standard normal distribution when N is large more peaked and with longer tails at small N

22 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 22 Confidence intervals based on t-scores When sample size is small, calculate CIs by replacing Z with the critical value of the t distribution. This helps, but CIs are still too small when sample sizes are very small.

23 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 23 Confidence intervals for means interval that has a certain probability of including the value of the true mean of the population smaller than CI for observations Probability or Sample means Observations

24 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 24 Confidence intervals for the median If distribution is highly skewed, or sample size is very small, confidence intervals for the mean based on the t-distribution are very biased (underestimate true CI). As an alternative, calculate CI for median instead of the mean.

25 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 25 Confidence intervals for the median based on the binomial distribution b(x) with p = 0.5. Out of a sample of n = 10, what is the probability of obtaining only x = 1, 2, …n observations below the median? Because b(x) is discrete, confidence intervals won’t be exactly at the 1-  level. 1-  CI: what range of values would we expect the true population median to lie 100(1-  ) percent of the time? 97.86% CI for the median given by values 1 and 9, 89.08% CI for the median given by values 2 and 8 012345678910 Probability

26 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 26 Confidence intervals for the variance sample estimates of population variance are distributed like Chi-square with n - 1 degrees of freedom  2 or s 2 is distributed like Chi-square  2 (df = 5) 05101520 0 0.2 0.3 Probability p =  = 0.05

27 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 27 Design of experiments How do I achieve a desired precision? How many times should I repeat the experiment to get “good” results? How many samples should I take if I want a precision (CV of the mean) of 5%? How to get a 99% confidence interval that is only n units wide? a goal (desired precision) estimate of dispersion (s 2 ) from a preliminary experiment, previous experience or a “guesstimate” What you wantWhat you need

28 University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 28 Required sample size: an example Preliminary sample of N = 10 yields mean = 100 and S.D. = 25. You want a CI = 2, so that there is a 95% chance that the true parametric mean is within 1 of the sample mean. Answer: n = 2404 by iterative solution. On average, your precision will be about what you want, but about 50% of the time the calculated CI will be less than the true CI because you used s 2 instead of  2.


Download ppt "University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/2015 11:23 PM 1 Some basic statistical concepts, statistics."

Similar presentations


Ads by Google