Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.

Slides:



Advertisements
Similar presentations
Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.
Advertisements

Estimation in Sampling
Sampling: Final and Initial Sample Size Determination
Sampling Distributions and Sample Proportions
Statistics and Quantitative Analysis U4320
Sampling Distributions
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Sampling Distributions (§ )
Introduction to Statistics
Central Limit Theorem.
Chapter 19 Confidence Intervals for Proportions.
Chapter 6 Introduction to Sampling Distributions
Chapter 7 Sampling and Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College
Evaluating Hypotheses
Sampling Distributions
Inferential Statistics
Standard error of estimate & Confidence interval.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Review of normal distribution. Exercise Solution.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation.
Sampling Theory Determining the distribution of Sample statistics.
Essential Statistics Chapter 101 Sampling Distributions.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Chapter 11: Estimation Estimation Defined Confidence Levels
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
Estimation of Statistical Parameters
Topic 5 Statistical inference: point and interval estimate
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
STAT 111 Introductory Statistics Lecture 9: Inference and Estimation June 2, 2004.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Statistical inference. Distribution of the sample mean Take a random sample of n independent observations from a population. Calculate the mean of these.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
PARAMETRIC STATISTICAL INFERENCE
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling Distributions.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
LSSG Black Belt Training Estimation: Central Limit Theorem and Confidence Intervals.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Confidence Interval Estimation For statistical inference in decision making:
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13 Sampling distributions
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Sampling Distributions Sampling Distributions. Sampling Distribution Introduction In real life calculating parameters of populations is prohibitive because.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Chapter 7: The Distribution of Sample Means
Sampling Theory Determining the distribution of Sample statistics.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 6 Confidence intervals Confidence intervals for the sample mean.
Dr.Theingi Community Medicine
Sampling Distributions
Statistics in Applied Science and Technology
Econ 3790: Business and Economics Statistics
Sampling Distributions
Sampling Distributions (§ )
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1

Sampling distribution of  and  Population Sample 1Sample 2Sample 3Sample 4Sample k …… Sampling Distribution …… 2

3 Central Limit Theorem (4) The mean of the sampling distribution of is equal to the population mean, i.e. (5) Standard deviation of the sampling distribution of is the population standard deviation divided by the square root of sample size, i.e.

4 Sampling distribution of for a Normal population)

5 Sampling dist. of for a non-Normal population

Computer simulation of the sampling distribution of the sample mean Pick any probability distribution and specify a mean and standard deviation. Tell the computer to randomly generate 1000 observations from that probability distributions E.g., the computer is more likely to spit out values with high probabilities Plot the “observed” values in a histogram. Next, tell the computer to randomly generate 1000 averages-of-2 (randomly pick 2 and take their average) from that probability distribution. Plot “observed” averages in histograms. Repeat for averages-of-10, and averages-of

Uniform Distribution on [0,1]: average of 1 sample (original distribution) 7

Uniform Distribution: 1000 averages of 2 samples 8

Uniform Distribution: 1000 averages of 5 samples 9

Uniform Distribution: 1000 averages of 100 samples 10

Exponential Distribution: 1000 averages of 2 samples 11

Exponential Distribution: average of 1 sample (original distribution) 12

Exponential Distribution: 1000 averages of 5 samples 13

Exponential Distribution: 1000 averages of 100 samples 14

Contents Summary of Statistics Learnt so Far Statistical Inference Central Limit Theorem and its implications Estimation theory Interval Estimation What is Confidence Interval? Tutorial 15

Estimation Theory In statistics, estimation refers to the process by which one makes inferences about a population, based on information obtained from a sample. Statisticians use sample statistics to estimate population parameters. For example, sample means are used to estimate population means; sample proportions, to estimate population proportions. 16

Two types of Estimates Point estimate. A point estimate of a population parameter is a single value of a statistic. For example, the sample mean x is a point estimate of the population mean μ. When we estimate the mean ( μ ) by x, the probability that we are exactly correct is close to zero, i.e. P(x= μ ) ~ 0 Assuming, the population is heterogeneous and the sample size n << population size N Hence, we are not very “confident” about our estimates we make using point estimates 17

Two Types of Estimates (contd.) How can we be more confident about our estimates? we want P(x = μ ) to be a bigger value than zero We can increase our confidence levels by using a less than precise estimates instead of point estimates estimate in an interval instead of point Interval estimate. An interval estimate is defined by two numbers, between which a population parameter is said to lie. For example, a < x < b is an interval estimate of the population mean μ. It indicates that the population mean is greater than a but less than b. 18

Contents Summary of Statistics Learnt so Far Statistical Inference Central Limit Theorem and its implications Estimation theory Interval Estimation What is Confidence Interval? Tutorial 19

History of Interval Estimation Neyman (1937) identified interval estimation ("estimation by interval") as distinct from point estimation ("estimation by unique estimate"). he was the first to recognize and formulate interval estimation work quoting results in the form of an estimate plus-or-minus a standard deviation was the interval estimation his paper on this was titled "On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection" given at the Royal Statistical Society on 19 June You can download the paper from :

What is an Interval Estimate? In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter in contrast to point estimation, which is a single number. Interval estimate. An interval estimate is defined by two numbers, between which a population parameter is said to lie. for example, a < μ < b is an interval estimate of the population mean μ. indicates that the population mean is greater than a but less than b. we use x to estimate this interval Interval estimates provide a "best estimate" of a parameter an indication of the precision with which the parameter is known. 21

Types of Interval Estimation The most prevalent forms of interval estimation are: confidence intervals a frequentist method credible intervals a Bayesian method Other common approaches to interval estimation, which are encompassed by statistical theory, are: Tolerance intervals Prediction intervals used mainly in Regression Analysis Of these, confidence intervals is the most common and widely used and hence, will be covered in more detail in this class 22

Contents Summary of Statistics Learnt so Far Statistical Inference Central Limit Theorem and its implications Estimation theory Interval Estimation What is Confidence Interval? Tutorial 23

What is a Confidence Interval? In statistics, a confidence interval (CI) is an interval estimate of a population parameter. instead of estimating the parameter by a single value, an interval likely to include the parameter is given. confidence intervals are used to indicate the reliability of an estimate. How likely the interval is to contain the parameter is determined by the confidence level increasing the desired confidence level will widen the confidence interval. Confidence intervals and interval estimates more generally have applications across the whole range of quantitative studies. 24

Example of Confidence Interval For example, a confidence interval can be used to describe how reliable some opinion survey results are. In a survey of election voting-intentions, the result might be that 40% of respondents intend to vote for a certain party. A 95% confidence level for the proportion in the whole population having the same intention on the survey date might be in the confidence interval 36% to 44%. From the same survey date one may calculate a smaller 90% confidence level for the proportion in the whole population of for instance in confidence interval 38% to 42%. All other things being equal, a survey result with a small confidence interval with a higher confidence level is more desired 25

Video on Confidence Interval 26

Example In the whole of Houston, what percentage of adults do you think will want to watch a movie sometime in the next 10 days? assume a variance of for the whole population Choose a random sample of 10 adults and ask their opinion Let X be the random variable denoting the percentage of adults attending the movies out of the sample. X i be the value from i th sample Will this be anywhere close to the actual percentage? How can we be sure to be closer to the actual mean? Take very large number of samples 27

Example (contd.) But, taking large number of samples is generally not feasible. We want to arrive at an estimate based on fewer samples. For example, in the previous example, if you take only 1 sample of 10 people and found that 5 of the 10 people would like to go for a movie, then you can say We are pretty sure that 50% of the adult population would want to go for a movie in the next 10 days. Isn’t this ambiguous? How sure is pretty sure? Need to be more definitive 28

Example (contd.) We use confidence interval to remove the ambiguity The only statement we can make which is 100% sure is that the 0%-100% of the adult population would want to watch a movie in the next 10 days. This statement doesn’t hold much importance as you are wrong half the time 90% sure or 95% sure or 98% sure or 99% sure What if we want to be 100% sure? What if we want to be 50% sure? Then, what kind of statements make sense? Confidence Levels 29

Calculating Confidence Level The general norm is to vary the interval by multiples of σ and compute the confidence level σ is varied equally on the either side of the mean The probability that μ is correct by the interval [x- σ,x+ σ ] can be calculated as Assuming Normal distribution, we get Source for calculations: What if we increase the interval from 2σ to 4σ? 30

Confidence Level Table Some of the most commonly used confidence levels in statistics are given in the table below: Less than 90% is generally not considered a strong enough confidence level to make a statement Confidence Level Number of σ s away from mean 90% % % %

Example (Contd.) Let us continue with computing the confidence interval for our movie example Assume that we took a random sample of 10 adults. Among them, 5 adults said that they would like to go for the movie in the next 10 days Hence, we get, mean (x)= 0.5 (denotes 50% ) and standard deviation = (Var(x) = σ 2 /n ) Say, we want to be 95% confident about our estimation. 32

Example (Contd.) From the table we can see that we have to be 1.96 σ away from the mean. Hence, we need to be 1.96* = 0.31 away from the mean Summarizing, we can now say with 95% confidence that the mean of the actual population will be between [ , ] = [0.19,0.81] which is between 19%-81% of total population What if you want to be 98% confident? 33

Graphical Representation of Confidence Intervals A plot of a normal distribution (or bell curve). Example Each colored band has a width of one standard deviation. 34

35 Confidence Interval for  when  is known A 95% confidence interval for  if  is known is given by: 95% 95% of the ‘s lie between

36 Rationale for Confidence Interval From the sampling distribution of conclude that  and are within 1.96 standard errors ( ) of each other 95% of the time Otherwise stated, 95% of the intervals contain  So, the interval can be taken as an interval that typically would include 

Example A random sample of 80 tablets had an average potency of 15mg. Assume  is known to be 4mg. =15,  =4, n=80 A 95% confidence interval for  is = (14.12, 15.88)

38