CIVE2602 - Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 6 Confidence intervals Confidence intervals for the sample mean.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Mean, Proportion, CLT Bootstrap
Estimation in Sampling
Statistics and Quantitative Analysis U4320
ENGR 4296 – Senior Design II Question: How do you establish your “product” is better or your experiment was a success? Objectives: Sources of Variability.
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
Point estimation, interval estimation
Estimating the Population Mean Assumptions 1.The sample is a simple random sample 2.The value of the population standard deviation (σ) is known 3.Either.
Estimation Procedures Point Estimation Confidence Interval Estimation.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
GG313 Lecture 8 9/15/05 Parametric Tests. Cruise Meeting 1:30 PM tomorrow, POST 703 Surf’s Up “Peak Oil and the Future of Civilization” 12:30 PM tomorrow.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Class notes for ISE 201 San Jose State University
Sampling Distributions
Inference about a Mean Part II
Chapter 7 Estimation: Single Population
Chapter 9 Hypothesis Testing.
Let sample from N(μ, σ), μ unknown, σ known.
Inferential Statistics
Business Statistics: Communicating with Numbers
Review of normal distribution. Exercise Solution.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Hypothesis Testing:.
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Lecture 3: Review Review of Point and Interval Estimators
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Dan Piett STAT West Virginia University
Estimation of Statistical Parameters
Chapter 8 Estimation Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
PARAMETRIC STATISTICAL INFERENCE
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.4 Estimation of a Population Mean  is unknown  This section presents.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.
Statistical estimation, confidence intervals
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
3.11 Using Statistics To Make Inferences 3 Summary Review the normal distribution Z test Z test for the sample mean t test for the sample mean Wednesday,
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Inferential Statistics Part 1 Chapter 8 P
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
LSSG Black Belt Training Estimation: Central Limit Theorem and Confidence Intervals.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved. Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Chapter 11: Estimation of Population Means. We’ll examine two types of estimates: point estimates and interval estimates.
 A Characteristic is a measurable description of an individual such as height, weight or a count meeting a certain requirement.  A Parameter is a numerical.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.4 Estimation of a Population Mean  is unknown  This section presents.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
GOSSET, William Sealy How shall I deal with these small batches of brew?
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 4 Probability distributions -Poisson (discrete events) -Binomial.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Confidence Intervals Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Estimation and Confidence Intervals. Point Estimate A single-valued estimate. A single element chosen from a sampling distribution. Conveys little information.
ESTIMATION.
Statistics in Applied Science and Technology
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Hypothesis Testing.
BUSINESS MATHEMATICS & STATISTICS.
Presentation transcript:

CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 6 Confidence intervals Confidence intervals for the sample mean Confidence Intervals (using t-distribution) Examples ©Claudio Nunez 2010, sourced from _Building_destroyed_in_Concepci%C3%B3n.jpg?uselang=en-gb Available under creative commons license

Population: μ σ 2, N Sample 2 Sample 3 Sample 4 Sample 1 Each sample will have a sample mean and a sample variance. Sampling (from last lecture)

The frequency distribution of a given sample statistic, e.g. the mean, standard deviation, range, etc, which would result if a large number of random samples of the same size (n) were drawn from the same population is known as the sampling distribution. Sampling Distributions e.g Sample distribution of the mean Example heights of students in lecture theatre Sample of size n=10

When working out if results of a trial/experiment are Statistically Significant Sampling Distributions- why important? -How reliable/good is a component i’m using on a Civil Eng project (concrete, steel truss, beam, new design/material etc. etc) -Is one component/material/design “Significantly” different from the another type More general areas: Any sort of medical, scientific experimental work, election poles, all experimental papers published from a University or company etc, -anywhere you might want to make predictions about general cases (populations) from samples

Sampling Distributions Take the sample mean For a sample of size n it can be shown that: The expected value of the sample mean = population mean The standard deviation of the sampling distribution of the mean is called the standard error of the mean.standard error of the mean. the spread of the sampling distribution of the mean decreases as the sample size increasesspread n=8 n=4 n=2 Very important to note the Difference between the distribution of X (the data) and the sample distribution of the mean

Sampling Distributions

Confidence Intervals Allows us to be 90 %, 95%, 99% sure that a population mean lies between certain values So we can take information about a sample and make inferences about the population e.g. The CONFIDENCE INTERVAL e.g. For a sample of students heights – we want, with 95% certainty, to find a range that the population mean falls between. Confidence Limits are the values at either end of this range.

We want 95% to fall Between confidence interval A(x) = in each tail - find Z (using reverse Normal tables) (amount of error we are willing to accept e.g (or 5%)) Significance level, 5% ( ) to fall outside confidence interval Value of 1.96 found from Normal tables for 95% (reverse look up of α/2 = 0.05/2 =0.025) A 95% Confidence Interval (C.I.) for a parameter, is an interval which is assessed to contain on 95% of occasions if repeated samples are taken. We usually wish to choose the smallest such interval, (with a symmetric population distribution will usually mean that we take a given distance either side of a given estimate. ) Confidence Intervals

We want 95% to fall Between confidence interval A(x) = in each tail - find Z (using reverse Normal tables) (amount of error we are willing to accept e.g (or 5%)) Significance level, 5% ( ) to fall outside confidence interval Value of 1.96 found from Normal tables for 95% (reverse look up of α/2 = 0.05/2 =0.025 ± 1.96 SD( ) Confidence Interval given by: A 95% Confidence Interval (C.I.) for a parameter, is an interval which is assessed to contain on 95% of occasions if repeated samples are taken. We usually wish to choose the smallest such interval, (with a symmetric population distribution will usually mean that we take a given distance either side of a given estimate. ) Confidence Intervals

In the particular case of the sample mean we can demonstrate that 95% of the time the mean of a sample of size n will be within of the sample mean. Confidence Intervals Hence we define the following conventional confidence intervals for the “population” means: ± SD( ) ± SD( ) ± 1.96 SD( ) where: SD ( ) 50% C.I. for μ... 90% C.I. for μ... 95% C.I. for μ... Found from Normal tables (reverse) TRUE WHEN WE KNOW σ AND FOR LARGE SAMPLES

In the particular case of the sample mean we can demonstrate that 95% of the time the mean of a sample of size n will be within of the sample mean. Confidence Intervals Hence we define the following conventional confidence intervals for the “population” means: ± SD( ) ± SD( ) ± 1.96 SD( ) where: SD ( ) 50% C.I. for μ... 90% C.I. for μ... 95% C.I. for μ... Found from Normal tables (reverse) NOTE: This assumes we know the POPULATION standard deviation σ. If we don’t- we have to use s (the sample standard deviation) as an estimate of the POPULATION standard deviation and use t-distribution for small samples. TRUE WHEN WE KNOW σ AND FOR LARGE SAMPLES

4 cases to consider 1) A sample (large or small) taken from a normally distributed population, with a known variance (σ 2 is known) 2) A large sample taken from a population, with a known variance (σ 2 known) 3) A large sample taken from a population, with an unknown variance (σ 2 NOT known) 4) A small sample taken from a population, with an unknown variance Confidence Intervals Typically large means > 20-30

Confidence Intervals Example Assume car speeds are Normally distributed with a standard deviation of 6 mph. If from a sample of 1000 cars we obtain a mean speed of 30.5 mph, what are the 95% confidence limits for μ? We would thus anticipate that μ lies in the range ± 1.96 x 0.19 i.e ± or (30.128, ) mph For n =1000 the standard deviation of is We have estimated the Population mean μ with 95% CI i.e. the population mean will have a 95% chance of being in the range calculated i)If the sample was of 100 cars what are 95% confidence limits for µ? ii)If the sample was of 100 cars what are 90% confidence limits for µ? ± SD( ) ± SD( ) ± 1.96 SD( ) 50% C.I. for μ... 90% C.I. for μ... 95% C.I. for μ... ± 1.96 SD( ) 95% C.I. for μ... ©Ian Fuller 2009, sourced from Available under creative commons license

Example -solution i)If the sample was of 100 cars what are 95% confidence limits for µ? ii) If the sample was of 100 cars what are 90% confidence limits for µ? We would thus anticipate that μ lies in the range ± 1.96 x 0.6 i.e ± or (29.324, 31.68) mph Larger range with smaller n We would thus anticipate that μ lies in the range ± x 0.6 i.e ± or (29.51, 31.49) mph Range becomes smaller for 90% CI For n=1000 it was (30.128, ) mph

± SD( ) ± 1.96 SD( ) 99% C.I. for μ... 95% C.I. for μ...

Multiple choice Choose A,B,C or D for each of these: In Statistics what does µ stand for? 1) A C D SAMPLES It’s the variance of a sample B It’s the mean of a sample It’s the mean of a populationIt’s the standard deviation of a sample

Multiple choice Choose A,B,C or D for each of these: In Statistics what is the name for σ 2 ? 2) A C D SAMPLES It’s the variance of a population B It’s the standard deviation squared of a sample It’s the standard deviation of a population it’s the mean of a sample

Multiple choice Choose A,B,C or D for each of these: In Statistics what does n stand for? 3) A C D SAMPLES It’s the number of samples taken B It’s the size of the sample It’s the nth member of a sampleIt’s the 1 st element in a sample

Multiple choice Choose A,B,C or D for each of these: In Statistics what does s stand for? 4) A C D SAMPLES It’s the variance of a sample B It’s the variance of a population It’s the mean of a populationIt’s the standard deviation of a sample

Multiple choice Choose A,B,C or D for each of these: Which of these is the standard deviation of a sample mean distribution (also called the standard error)? 5) A C D SAMPLING DISTRIBUTIONS B

Multiple choice Choose A,B,C or D for each of these: When would a t-distribution be more suitable to represent a Sampling Distribution (as opposed to a NORMAL distribution)? 6) A C D SAMPLING DISTRIBUTIONS For a large sample with known population standard deviation, σ B For any large sample For a small sample with a known population variance a small sample with an unknown population standard deviation, σ

Instead we use t-tables (extra parameter, v, the degrees of freedom) For small samples the t-distribution is distinctly flatter than the Normal distribution (for large samples n>30 approximates to normal distribution) Often we don’t know the POPULATION standard deviation σ (or variance σ 2 ). Use of the t-Distribution We can use the sample standard deviation as an estimate, but there are consequences t distribution Normal distribution t - distribution Sample size smallσ 2 unknown- use t Sample size smallσ 2 known- use z Sample size large σ 2 unknown - use z Sample size large σ 2 known - use z is no longer closely approximated by the standard Normal distribution (Z).

v=1 v=2 v=5 v=10 v=infinite t-distribution uses another parameter, v, the number of degrees of freedom (typically this is equal to n-1) t-tistribution approaches NORMAL distribution when n>30 (ish) (that’s why we can use the normal distribution, even when don’t know σ, but have a large sample ) Degrees of freedom v = n-1 (it changes in more complex examples)

Using t-tables Lets say we want 95% Confidence Limits and n = 20 Degrees of freedom v = 20–1 = 19 95% CI mean in either tail. (compare with 1.96 for Normal Distribution) Confidence Interval= X

. Worked Example: confidence interval using t-dist n Q A survey is made of the output from a factory on eight randomly selected days in November, with the results as shown below What is the 95% confidence interval for the true mean output in November, assuming flows are approximately Normally distributed? Answer: What we are being asked here is to make an estimate of the population mean, knowing only the sample mean and variance of a sample size 8, i.e. a small sample. -estimate the true mean output (population mean) by the mean of the sample -and we would estimate the variance of the output population from the variance of the sample ©Paul Hows 2008, sourced from / Available under creative commons license

What is the 95% confidence interval for the true mean output in November, assuming flows are approximately Normally distributed? -we want a 95% confidence interval, we want 2½ % in each tail, so we look up A(t) = calculate the variance of the sample mean as follows: -Since the sample size is n=8, the number of degrees of freedom, v, is 7 (i.e. n-1). The tables give the critical t-value as:

What is the 95% confidence interval for the true mean output in November, assuming flows are approximately Normally distributed? -we want a 95% confidence interval, we want 2½ % in each tail, so we look up A(t) = calculate the variance of the sample mean as follows: -Since the sample size is n=8, the number of degrees of freedom, v, is 7 (i.e. n-1). The tables give the critical t-value as Hence the 95% confidence interval for μ is or to give an integer range. If we had known that the underlying population was σ 2 = (instead of just our sample estimate) we could have used the normal distribution and critical values from Z tables. Work out the 95% CI for µ if this were the case.

N.B. If we had known that the underlying population was σ 2 = (instead of just our sample estimate) we could have used the normal distribution and critical values from Z tables i.e. or The wider confidence interval found using the t-distribution reflects our greater degree of uncertainty in having estimated σ 2 from the small sample. t distribution Normal distribution t - distribution Sample size smallσ 2 unknown- use t Sample size smallσ 2 known- use z Sample size large σ 2 unknown - use z Sample size known σ 2 known - use z or Result when σ not known

VLE demonstrate sampling distribution and central limit theorem

To test the strength of each batch of concrete, a sample of 9 small blocks are produced in moulds and left to set. After a week the site engineer tests the strength of each block by measuring the force (N/mm 2 ) required to crush each one. Sample strengths: Question i) Assuming the population is Normally distributed and the population variance= 4, find the sample mean, X and SD(X) ii) A batch of concrete is passed for site if, from analysing the strength of the sample, the mean cube strength can be shown to be significantly above the national minimum standard of 2 N/mm 2. Test for this by finding the probability that we could obtain our sample mean IF the population mean, µ, was 2.

Sample strengths: σ 2 = 4, n=9 ii) A batch of concrete is passed for site if, from analysing the strength of the sample, the mean cube strength can be shown to be significantly above the national minimum standard of 2 N/mm 2. Test for this by finding the probability that we could obtain our sample mean, IF the population mean, µ, was 2. We want P(X > 4) {if the population mean strength was 2} So, we want P(Z > 3) = So it’s very unlikely (0.14%) that a sample mean of 4 would have come from a population, with a mean µ of 2. We can conclude that underlying population mean must be higher than 2.

Sample strengths: σ 2 = 4, n=9 ii) Concrete for a site which requires a higher performance concrete requires that a batch of concrete is only passed for site if, from analysing the strength of the sample, the mean cube strength can be shown to be significantly above a standard of 2.5 N/mm 2. Would the concrete still pass? Test for this by finding the probability that we could obtain our sample mean, IF the population mean, µ, was 2.5. We want P(X > 4) {if the population mean strength was 2.5} So, we want P(Z > 2.25) = So it’s very unlikely (1.2%) that a sample mean of 4 would have come from a population, with a mean µ of 2.5 (i.e. there is an 98.8% chance that it didn’t) We can conclude that underlying population mean must be higher than 2.5

CIVE Engineering Mathematics 2.2 Lecture 6- Summary Distribution of Sample Mean Confidence Intervals (e.g. 99%, 95%, 90% CI) Using t-distribution NEXT hypothesis testing – very useful