Part 21: Statistical Inference 21-1/43 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Lecture 8: Hypothesis Testing
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Part 3 Probabilistic Decision Models
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =
Describing Data: Measures of Dispersion
BUS 220: ELEMENTARY STATISTICS
CALENDAR.
Copyright © 2010 Pearson Education, Inc. Slide
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Statistics and Data Analysis
Simple Linear Regression 1. review of least squares procedure 2
Sampling Distributions
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
PP Test Review Sections 6-1 to 6-6
Random Walk Models for Stock Prices Statistics and Data Analysis Professor William Greene Stern School of Business Department of IOMS Department of Economics.
1 Econ 240A Power Four Last Time Probability.
The Frequency Table or Frequency Distribution Table
Chi-Square and Analysis of Variance (ANOVA)
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
1 Slides revised The overwhelming majority of samples of n from a population of N can stand-in for the population.
Hypothesis Tests: Two Independent Samples
Chapter 10 Estimating Means and Proportions
Part 12: Statistical Inference 12-1/45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Introduction Our daily lives often involve a great deal of data, or numbers in context. It is important to understand how data is found, what it means,
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Quantitative Analysis (Statistics Week 8)
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Subtraction: Adding UP
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Part 25: Qualitative Data 25-1/21 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Essential Cell Biology
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Chapter 8 Estimation Understandable Statistics Ninth Edition
Clock will move after 1 minute
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Experimental Design and Analysis of Variance
Essential Cell Biology
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Simple Linear Regression Analysis
Part 13: Statistical Tests – Part /37 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of.
Correlation and Linear Regression
CHAPTER 14: Confidence Intervals: The Basics
Part 14: Statistical Tests – Part /25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of.
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Completing the Square Topic
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Commonly Used Distributions
Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB ,
Statistics and Data Analysis
Presentation transcript:

Part 21: Statistical Inference 21-1/43 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 21: Statistical Inference 21-2/43 Statistics and Data Analysis Part 21 – Statistical Inference: Confidence Intervals

Part 21: Statistical Inference 21-3/43 Statistical Inference: Point Estimates and Confidence Intervals Statistical Inference Estimation Concept Sampling Distribution Point Estimates and the Law of Large Numbers Uncertainty in Estimation Interval Estimation

Part 21: Statistical Inference 21-4/43 Application: Credit Modeling 1992 American Express analysis of Application process: Acceptance or rejection Cardholder behavior Loan default Average monthly expenditure General credit usage/behavior 13,444 applications in November, 1992

Part 21: Statistical Inference 21-5/43 Modeling Fair Isaacss Acceptance Rate 13,444 Applicants for a Credit Card (November, 1992) Experiment = A randomly picked application. Let X = 0 if Rejected Let X = 1 if Accepted RejectedApproved

Part 21: Statistical Inference 21-6/43 The Question They Are Really Interested In: Default Of 10,499 people whose application was accepted, 996 (9.49%) defaulted on their credit account (loan). We let X denote the behavior of a credit card recipient. X = 0 if no default (Bernoulli) X = 1 if default This is a crucial variable for a lender. They spend endless resources trying to learn more about it. Mortgage providers in could have, but deliberately chose not to.

Part 21: Statistical Inference 21-7/43 The data contained many covariates. Do these help explain the interesting variables?

Part 21: Statistical Inference 21-8/43 Variables Typically Used By Credit Scorers

Part 21: Statistical Inference 21-9/43 Sample Statistics The population has characteristics Mean, variance Median Percentiles A random sample is a slice of the population

Part 21: Statistical Inference 21-10/43 Populations and Samples Population features of a random variable. Mean = μ = expected value of a random variable Standard deviation = σ = (square root) of expected squared deviation of the random variable from the mean Percentiles such as the median = value that divides the population in half – a value such that 50% of the population is below this value Sample statistics that describe the data Sample mean = = the average value in the sample Sample standard deviation = s tells us where the sample values will be (using our empirical rule, for example) Sample median helps to locate the sample data on a figure that displays the data, such as a histogram.

Part 21: Statistical Inference 21-11/43 The Overriding Principle in Statistical Inference The characteristics of a random sample will mimic (resemble) those of the population Mean, median, standard deviation, etc. Histogram The resemblance becomes closer as the number of observations in the (random) sample becomes larger. (The law of large numbers)

Part 21: Statistical Inference 21-12/43 Point Estimation We use sample features to estimate population characteristics. Mean of a sample from the population is an estimate of the mean of the population: is an estimator of μ The standard deviation of a sample from the population is an estimator of the standard deviation of the population: s is an estimator of σ

Part 21: Statistical Inference 21-13/43 Point Estimator A formula Used with the sample data to estimate a characteristic of the population (a parameter) Provides a single value:

Part 21: Statistical Inference 21-14/43 Sampling Distribution The random sample is itself random, since each member is random. Statistics computed from random samples will vary as well.

Part 21: Statistical Inference 21-15/43 Estimating Fair Isaacss Acceptance Rate 13,444 Applicants for a Credit Card (November, 1992) Experiment = A randomly picked application. Let X = 0 if Rejected Let X = 1 if Accepted The 13,444 observations are the population. The true proportion is μ = We draw samples of N from the 13,444 and use the observations to estimate μ. RejectedApproved

Part 21: Statistical Inference 21-16/43 The Estimator

Part 21: Statistical Inference 21-17/ is the true proportion in the population we are sampling from.

Part 21: Statistical Inference 21-18/43 The Mean is A Good Estimator Sometimes is too high, sometimes too low. On average, it seems to be right. The sample mean of the 100 sample estimates is The population mean (true proportion) is

Part 21: Statistical Inference 21-19/43 What Makes it a Good Estimator? The average of the averages will hit the true mean (on average) The mean is UNBIASED (No moral connotations)

Part 21: Statistical Inference 21-20/43 What Does the Law of Large Numbers Say? The sampling variability in the estimator gets smaller as N gets larger. If N gets large enough, we should hit the target exactly; The mean is CONSISTENT

Part 21: Statistical Inference 21-21/43 N=144 N=1024 N= to.88

Part 21: Statistical Inference 21-22/43 Uncertainty in Estimation How to quantify the variability in the proportion estimator Variable| Mean Std.Dev. Minimum Maximum Cases Missing Average of the means of the 100 samples of 144 observations RATES144| Average of the means of the 100 samples of 1024 observations RATE1024| Average of the means of the 100 samples of 4900 observations RATE4900| The population mean (true proportion) is

Part 21: Statistical Inference 21-23/43 Range of Uncertainty The point estimate will be off (high or low) Quantify uncertainty in ± sampling error. Look ahead: If I draw a sample of 100, what value(s) should I expect? Based on unbiasedness, I should expect the mean to hit the true value. Based on my empirical rule, the value should be within plus or minus 2 standard deviations 95% of the time. What should I use for the standard deviation?

Part 21: Statistical Inference 21-24/43 Estimating the Variance of the Distribution of Means We will have only one sample! Use what we know about the variance of the mean: Var[mean] = σ 2 /N Estimate σ 2 using the data: Then, divide s 2 by N.

Part 21: Statistical Inference 21-25/43 The Sampling Distribution For sampling from the population and using the sample mean to estimate the population mean: Expected value of will equal μ Standard deviation of will equal σ/ N CLT suggests a normal distribution

Part 21: Statistical Inference 21-26/43 The sample mean for a given sample may be very close to the true mean The sample mean for a given sample may be quite far from the true mean This is the sampling variability of the mean as an estimator of μ

Part 21: Statistical Inference 21-27/43 Recognizing Sampling Variability To describe the distribution of sample means, use the sample to estimate the population expected value To describe the variability, use the sample standard deviation, s, divided by the square root of N To accommodate the distribution, use the empirical rule, 95%, 2 standard deviations.

Part 21: Statistical Inference 21-28/43 Estimating the Sampling Variability For one of the samples, the mean was 0.849, s was s/N = If this were my estimate, I would use ± 2 x For a different sample, the mean was 0.750, s was 0.433, s/N = If this were my estimate I would use ± 2 x

Part 21: Statistical Inference 21-29/43 Estimates plus and minus two standard errors

Part 21: Statistical Inference 21-30/43 Will the Interval Contain the True Value? Uncertain: The midpoint is random; it may be very high or low, in which case, no. Sometimes it will contain the true value. The degree of certainty depends on the width of the interval. Very narrow interval: very uncertain. (1 standard errors) Wide interval: much more certain (2 standard errors) Extremely wide interval: nearly perfectly certain (2.5 standard errors) Infinitely wide interval: Absolutely certain.

Part 21: Statistical Inference 21-31/43 The Degree of Certainty The interval is a Confidence Interval The degree of certainty is the degree of confidence. The standard in statistics is 95% certainty (about two standard errors).

Part 21: Statistical Inference 21-32/43 67 % and 95% Confidence Intervals

Part 21: Statistical Inference 21-33/43 Monthly Spending Over First 12 Months Population = 10,239 individuals who (1) Received the Card (2) Used the card at least once (3) Monthly spending no more than What is the true mean of the population that produced these data?

Part 21: Statistical Inference 21-34/43 Estimating the Mean Given a sample N = 225 observations = S = Estimate the population mean Point estimate % confidence interval: ± 1 x /225 = to % confidence interval: ± 2 x /225 = to % confidence interval: ± 2.5 x /225 = to

Part 21: Statistical Inference 21-35/43 Where Did the Interval Widths Come From? Empirical rule of thumb: 2/3 = 66 2/3% is contained in an interval that is the mean plus and minus 1 standard deviation 95% is contained in a 2 standard deviation interval 99% is contained in a 2.5 standard deviation interval. Based exactly on the normal distribution, the exact values would be standard deviations for 2/3 (rather than 1.00) standard deviations for 95% (rather than 2.00) standard deviations for 99% (rather than 2.50)

Part 21: Statistical Inference 21-36/43 Large Samples If the sample is moderately large (over 30), one can use the normal distribution values instead of the empirical rule. The empirical rule is easier to remember. The values will be very close to each other.

Part 21: Statistical Inference 21-37/43 Refinements (Important) When you have a fairly small sample (under 30) and you have to estimate σ using s, then both the empirical rule and the normal distribution can be a bit misleading. The interval you are using is a bit too narrow. You will find the appropriate widths for your interval in the t table The values depend on the sample size. (More specifically, on N-1 = the degrees of freedom.)

Part 21: Statistical Inference 21-38/43 Critical Values For 95% and 99% using a sample of 15: Normal: and Empirical rule: and T[14] table: and Note that the interval based on t is noticeably wider. The values from t converge to the normal values (from above) as N increases. What should you do in practice? Unless the sample is quite small, you can usually rely safely on the empirical rule. If the sample is very small, use the t distribution.

Part 21: Statistical Inference 21-39/43 n = N-1 Small sample Large sample

Part 21: Statistical Inference 21-40/43 Application A sports training center is examining the endurance of athletes. A sample of 17 observations on the number of hours for a specific task produces the following sample: 4.86, 6.21, 5.29, 4.11, 6.19, 3.58, 4.38, 4.70, 4.66, 5.64, 3.77, 2.11, 4.81, 3.31, 6.27, 5.02, 6.12 This being a biological measurement, we are confident that the underlying population is normal. Form a 95% confidence interval for the mean of the distribution. The sample mean is The sample standard deviation, s, is The standard error of the mean is 1.16/17 = Since this is a small sample from the normal distribution, we use the critical value from the t distribution with N-1 = 16 degrees of freedom. From the t table (previous page), the value of t[.025,16] is The confidence interval is ± 2.120(0.281) = [4.170,5.362]

Part 21: Statistical Inference 21-41/43

Part 21: Statistical Inference 21-42/43 Confidence Interval for Regression Coefficient Coefficient on OwnRent Estimate = Standard error = Confidence interval ± 1.96 X (large sample) = ± = to Form a confidence interval for the coefficient on SelfEmpl. (Left for the reader)

Part 21: Statistical Inference 21-43/43 Summary Methodology: Statistical Inference Application to credit scoring Sample statistics as estimators Point estimation Sampling variability The law of large numbers Unbiasedness and consistency Sampling distributions Confidence intervals Proportion Mean Regression coefficient Using the normal and t distributions instead of the empirical rule for the width of the interval.