From Distributions to Confidence

Slides:



Advertisements
Similar presentations
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
Advertisements

Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Copyright © 2005 by Evan Schofer
Standard Error and Research Methods
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Introduction to Data Analysis Probability Distributions.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
Estimation of Statistical Parameters
Topic 5 Statistical inference: point and interval estimate
Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
 At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Confidence Intervals Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 6 Confidence intervals Confidence intervals for the sample mean.
Dr.Theingi Community Medicine
Samples and Populations Statistics for Political Science Levin and Fox Chapter 6 1.
Chapter 8: Estimating with Confidence
Variability.
GOVT 201: Statistics for Political Science
And distribution of sample means
Chapter 8: Estimating with Confidence
Sampling Distributions
AP Biology Intro to Statistics
Hypothesis Testing: One Sample Cases
Practice & Communication of Science From Distributions to Confidence
Using the t-distribution
Practice & Communication of Science
Estimating the Population Mean Income of Lexus Owners
Having Confidence in our Means: Confidence Intervals
Sampling Distributions & Point Estimation
Central Tendency and Variability
Chapter 8: Estimating with Confidence
Chapter 7 Sampling Distributions.
AP Biology Intro to Statistics
The Detail of the Normal Distribution
The normal distribution
Chapter 7 Sampling Distributions.
Review: What influences confidence intervals?
The Normal Distribution
Estimating with Confidence
AP Biology Intro to Statistic
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
AP Biology Intro to Statistic
Samples and Populations
AP Biology Intro to Statistic
Chapter 8: Estimating with Confidence
Estimating with Confidence
Chapter 7 Sampling Distributions.
Chapter 8: Estimating with Confidence
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Normal Distribution Z-distribution.
Chapter 7 Sampling Distributions.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
GENERALIZATION OF RESULTS OF A SAMPLE OVER POPULATION
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Advanced Algebra Unit 1 Vocabulary
2/5/ Estimating a Population Mean.
Chapter 8: Estimating with Confidence
Chapter 7 Sampling Distributions.
Chapter 8: Estimating with Confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Chapter 8: Estimating with Confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Scientific Practice The Detail of the Normal
Data Literacy Graphing and Statisitics
Presentation transcript:

From Distributions to Confidence Practice & Communication of Science From Distributions to Confidence

The Normal Distribution Represents the idealised distribution of a large number of things we measure in biology many parameters approximate to the ND Is defined by just two things… population mean µ (mu) the centre of the distribution (mean=median=mode) population standard deviation (SD) σ (sigma) the distribution ‘width’ (mean  point of inflexion) encompasses 68% of the area under the curve 95% of area found within 1.96 σ either side of mean

The Normal Distribution Is symmetrical mean=median=mode

The Normal Distribution One SD either side of mean includes 68% of represented population SD boundary is inflexion point curvature changes direction the ‘s’ bit 2* SD covers 95% (*actually 1.96) 3 SD covers 99.7%

The Normal Distribution All Normal Distributions are similar differ in terms of… mean SD (governs how ‘spikey’ curve is) Fig below… 4 different SDs, 2 different means

Standardising Normal Distributions Regardless of what they measure, all Normal Distributions can be made identical by… subtracting the mean from every reading the mean then becomes zero dividing each reading by the SD a reading one SD bigger  +1 Called Standard Scores or z-scores amazing! Different measurements  same ‘view’

Standard (z) Scores A ‘pure’ way to represent data distribution the actual measurements (mg, m, sec) disappear! replaced by number of SDs from the mean (zero) For any reading, z = (x - µ) / σ A survey of daily travel time had these results (in minutes): 26,33,65,28,34,55,25,44,50,36,26,37,43,62,35,38,45,32,28,34 The Mean is 38.8 min, and the SD is 11.4 min To convert the values to z-scores… eg to convert 26 first subtract the mean: 26 - 38.8 = -12.8, then divide by the Standard Deviation: -12.8/11.4 = -1.12 So 26 is -1.12 Standard Deviations from the Mean

Familiarity with the Normal Distribution 95% of the class are between 1.1 and 1.7m tall what is the mean and SD? Assuming normal distribution… the distribution is symmetrical, so mean height is (1.7 - 1.1) / 2 = 1.4m the range 1.1  1.7m covers 95% of the class, which equals ± 2 SDs one SD = (1.7 – 1.1) / 4 = 0.6 / 4 = 0.15m

Familiarity with the Normal Distribution One of that class is 1.85m tall what is the z-score of that measurement? Assuming normal distribution… z-score = (x - µ) / σ z = (1.85m - 1.4m) / 0.15m = 0.45m / 0.15m = 3 note there are no units 3 SDs cover 99.7% of the population only 1.5 in 1000 of the class will be as tall/taller a big class, with fractional students! 

Familiarity with the Normal Distribution 36 students took a test; you were 0.5 SD above the average; how many students did better? from the curve, 50% sit above zero from the curve, 19.1% sit between 0 and 0.5 SD so 30.9% sit above you 30.9% of 36 is about 11

Familiarity with the Normal Distribution Need to have a ‘feel’ for this…

Populations and Samples – a Diversion? A couple of seemingly pedantic but important points about distributions… population the potentially infinite group on which measurements might be made don’t often measure the whole population sample a sub-set of the population on which measurements are actually made most studies will sample the population n is the number studied n-1 called the ‘degrees of freedom’ often extrapolate sample results to the population

Populations and Samples – so what? They are described/calculated differently… μ is the population mean, x is the sample mean σ or σn is population SD, s or σn-1 is sample SD Calculating the SD is different for each most calculators do it for you… as long as you choose the right type (pop vs samp)

Populations and Samples – choosing Analysing the results of a class test… population, since you don’t intend extrapolating the results to all students everywhere Analysing the results of a drug trial… sample, since you expect the conclusions to apply to the larger population A national census collects information about age population, since by definition the census is about the population taking part in the survey If in doubt, use the sample SD and as n increases, the difference decreases

Populations and Samples – implications If the sample observed is the population, then the mean and SD of that sample are the population mean and the population SD ie you calculate σ (or σn) If the sample is part of the bigger population, then the sample mean and SD are estimates of the population mean and the population SD ie you calculate σn-1 (or s) (the presumption here is you chose your sample to reflect the population!)

Implications of Estimating Pop Mean For a sample, the ‘quality’ of its estimate of the population mean and SD depends on the number of observations made the mean from a sample of 1 member of the population is unlikely to be close to the pop mean if you sampled the whole population, your sample estimate is the population mean in between, adding extra readings to a sample will improve your estimate of the population mean also, repeated sampling  a collection of means eg you sampled 200 FVCs and did this 100 times the 100 means are ND and will have its own SD (!) called the Standard Error of the Mean (SEM)

The Standard Error of the Mean Immediate recap… repeated samples of a distribution will produce different estimates of the population mean the SD of those estimates of pop mean called the SEM Surprisingly easy to calculate we don’t need to keep repeating our sampling can derive it from SD of data in one sample SEM = sample standard dev / square root of number of samples SEM = s / √ N eg if N=16, then SEM is 4x smaller than SD

Remember, Samples  Estimates When we sample a population, we end up with a sample mean, x our ‘best guess’ estimate of the real pop mean, µ the ‘real’ mean of the population is ‘hidden’ Our sample also has a measure of the variability of the data that comprises it the sample Standard Deviation, s which is also an estimate of the population SD, σ s can be also be used to indicate the variability of the population mean itself SEM = s / √ N can then use SEM to determine confidence limits

Confidence Limits and the SEM The SEM reflects the ‘fit’ of a sample mean, x , to the underlying population mean, µ if we calculate two sample means and they are similar, but for one the SEM is high, we are less ‘confident’ about how well that sample mean estimates the population mean Just like the ‘raw’ data used to calculate a sample mean follows a distribution, so do repeat estimates of the population mean itself this is the t Distribution similar to the Normal Distribution

The t Distribution Yet another distribution!  but distributions are important because they define how we expect our data to behave if we know that, then we gain insight into our expts! Generally ‘flatter’ than the Normal Distribution any particular area is more ‘spread out’ (less clear) the more ‘pointed’ a curve, the clearer the peak

t Distribution Pointedness Varies! Logic… the number of samples influences the ‘accuracy’ of our estimate of the population mean from the sample mean as N increases, the ‘peak’ becomes sharper a given area of the curve is less ‘spread out’ At large N, t Distribution = Normal Distribution and for both, 95% of curve contained in ± 1.96 SD

Using the t Distribution When we calculate a sample mean and call it our estimate of the population mean… it’s nice to know how ‘confident’ we are in that estimate One measure of confidence is the 95% Confidence Interval (95%CI) the range over which we are 95% confident the ‘true’ population mean lies derived from our sample mean (we calculate) and our SEM (we calculate) and the N (though it’s the ‘degrees of freedom’, N-1) we use this to look up a ‘critical t value’

From t Distribution to 95%CI The t Distribution is centred around our mean and its shape is influenced by N-1 95%CI involves chopping off the two 2.5% tails Need a t table to look up how many SEMs (SDs) along the x-axis this point will be Value varies with N-1 And level of confidence sought

Step 1: The t Table t value varies with… For N = 10, α = 0.05 Row… DoF is N-1 Column… level of ‘confidence’ 95%CI involves chopping off the two 2.5% tails α = 0.05 (5%) For N = 10, α = 0.05 t(N-1),0.05 = 2.262 when N large, t=1.96

Step 2: Using the t value The t value is the number of SEMs along the x-axis (in each direction) that encompasses that % of the t distribution centred on our mean 2.262 in the case of t(N-1),0.05 Eg we measure the FVC (litres) of 10 people… mean = 3.83, SD = 1.05, N = 10 SEM = 1.05/√10 = 0.332 litres t(N-1),0.05 = 2.262 standard errors to cover 95% curve So, litres either side of the mean = 2.262 * 0.332 = 0.751 litres either side of mean covers 95% of dist So, 95%CI is 3.83 ± 0.751 = 3.079  4.581 litres (3.079, 4.581)

Effect of Bigger N A larger sample size gives us greater confidence in any population mean we estimate so 95%CI should be smaller In previous example… mean = 3.83, SD = 1.05, N =10, SEM = 0.332 95%CI is (3.079, 4.581) But say we measured 90 more people… mean = 3.55, SD = 0.915, N = 100 mean and SD similar to before, but SEM now a lot smaller, at 0.915/√100 = 0.0915 so too is t(N-1),0.05 = 1.96 (rather than 2.262) 95%CI = 3.55 ± (1.96 * 0.0915) = 3.371  3.729

Effect of Bigger N A bigger N ‘sharpens’ the t distribution so that the 95% boundaries are less far apart ie our confidence interval will become smaller 95%CI also shrinks because SEM = SD/√N

Summary Normal Dist fully defined just by mean and SD Transformation to z-scores makes all NDs identical SD calculation differs for sample vs population Estimation of population mean from a sample is always prone to uncertainty Standard Error of Mean (s/√N) reflects uncertainty Estimates of means follow the t Distribution t Distribution becomes ‘sharper’ with higher N ‘Width’ of t dist covering 95% is called 95%CI range in which 95/100 mean estimates would fall 95%CI = mean ± (t(N-1),0.05 * SEM) t is the number of SEMs along dist covering that %