Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College

Slides:



Advertisements
Similar presentations
Chapter 16 Inferential Statistics
Advertisements

Objectives 10.1 Simple linear regression
Chapter 6 Sampling and Sampling Distributions
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
© 2011 Pearson Education, Inc
Statistics and Quantitative Analysis U4320
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Objectives (BPS chapter 24)
Class Handout #3 (Sections 1.8, 1.9)
Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing.
Chapter 7 Sampling and Sampling Distributions
The Simple Regression Model
QM Spring 2002 Business Statistics Sampling Concepts.
Chapter 8 Estimation: Single Population
Quantitative Methods – Week 5: Linear Regression Analysis
Part III: Inference Topic 6 Sampling and Sampling Distributions
Chapter 7 Estimation: Single Population
Quantitative Methods – Week 8: Multiple Linear Regression
Quantitative Methods – Week 7: Inductive Statistics II: Hypothesis Testing Roman Studer Nuffield College
Inference for regression - Simple linear regression
More About Significance Tests
BPS - 3rd Ed. Chapter 211 Inference for Regression.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Chapter 11: Estimation Estimation Defined Confidence Levels
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
PARAMETRIC STATISTICAL INFERENCE
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
1 Estimation From Sample Data Chapter 08. Chapter 8 - Learning Objectives Explain the difference between a point and an interval estimate. Construct and.
CHAPTER SIX Confidence Intervals.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Inferential Statistics Part 1 Chapter 8 P
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Confidence Interval Estimation For statistical inference in decision making:
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved. Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and.
Inference: Probabilities and Distributions Feb , 2012.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Chapter 9: One- and Two-Sample Estimation Problems: 9.1 Introduction: · Suppose we have a population with some unknown parameter(s). Example: Normal( ,
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Confidence Intervals Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Chapter 6 Test Review z area ararea ea
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Chapter 6 Sampling and Sampling Distributions
The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Stats Methods at IC Lecture 3: Regression.
Sampling Distributions
Confidence Intervals and Sample Size
Inference for Regression
CHAPTER 12 More About Regression
Statistics in Applied Science and Technology
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Chapter 9: One- and Two-Sample Estimation Problems:
CHAPTER 12 More About Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College

Repetition: Fitting the Regression Line  The regression line predicts the values of Y based on the values of X. Thus, the best line will minimise the deviation between the predicted and the actual values (the error, e) =Y UK - Ŷ UK Regression line IP=a+bWage

Repetition: The Goodness of Fit x Regression line Total variation = explained variation + unexplained variation TSS ESS USS R²=ESS/TSS

Homework This means that the regression coefficient is (Product of the Deviations divided the Deviation squared of X). Therefore a = (29826-(703,55 x 51)

Homework (II)  The Coefficient of Determination will be therefore: / =  This will mean that Education is able to account for 72% of the GDP per person. Complete data set:  Coefficient of determination: R-squared =  a= ; b= XiYiYi-MeanSquares Norway Switrzerlan d US Brazil Iran Total Sum of Squares XiYiYi-Mean Y PredictedExplained variation Norway Switrzerla nd US Brazil Iran Explained sum of squares XiYiYi-Mean Y PredictedUnexplained variation Norway Switrzerlan d US Brazil Iran Residual sum of squares

Inductive Statistics: Introduction So far, we have only looked at samples, and we will most often only have samples and not entire populations  We have described and analysed these samples and computed means, standard deviations, correlation coefficients, regression coefficient, etc.  However, because of "the luck of the draw“, the estimated parameters will deviate from the ‘true’ parameters of the whole population (sampling error) We now move from descriptive statistics to inductive statistics…  We no longer only describe samples, but we now draw conclusions about characteristics of the entire statistical population based on our sample  Chapters 5 & 6 provide the tools necessary to make inferences from a sample

Inductive Statistics: Introduction (II) What can we infer from a sample?  If we know the sample mean, how good is this an estimator of the population mean?  If we calculated the correlation and regression coefficient from a sample of observations, how good is this an estimator of the ‘true’ correlation and regression coefficient?  How reliable are our estimates?

Sample Biases In a first step, especially when working with historical data, we need to ascertain whether our sample is likely to be representative or whether is may suffer from some serious bias problems…  Is the sample of records that has survived representative of the full set of records that was originally recorded? Business records, household inventories Did all records have an equal chance to make their way to the archive (success bias)?  Is the sample drawn from the records representative of the information in those records? Should you computerise information of people whose surname begins with W? Is B possibly a better choice? Rate of return on equity (survivorship bias)  Is the information in the records representative of a wider population than that covered by the records? Height records of recruits Tax records (selection bias)  Sampling will affect the inferences we (can) draw

Sampling Distribution  Sampling distribution refers to the distribution of the parameters that would be obtained if a large number of random samples of a given size were drawn from a given population; it is a hypothetical distribution  Example: We draw a sample of 20 rabbits and then we calculate the mean ear length. After this we let the rabbits free. We repeat this 100 times. We get 100 estimates of mean ear length based on 100 samples of 20 rabbits. The distribution may look like this 4 times, we calculated 52.5<mean<=55 15 times, we calculated 55<mean<= times, we calculated 57.5<mean<=60

Sampling Distribution (II) The standard error is the estimated standard deviation of the sampling distribution  X Sample mean estimates  X Probability  X SE(X) Sampling distribution of the sample mean  : population mean  X  X: sample means

Central Limit Theorem 1.Regardless of shape of the population distribution, as the sample size (of samples used to create the sampling distribution of the mean) increases, the shape of the sampling distribution becomes normal X 2.The mean of the sampling distribution will be equal to the ‘true’ but unknown population mean. On average, the known sample mean X will be equal to μ, the unknown population mean 3.The standard deviation of the sample (s) can be taken as the best estimate of the population standard deviation (σ). The standard error (SE) of the sample mean, i.e. the standard deviation of the sampling distribution is therefore

Standard Normal Probability Distribution X  With the mean ( X ) and the standard deviation (SE) of the sampling distribution, we have all the information about the distribution  However, we now want to standardise this sampling distribution using  The distribution of Z has always a mean of zero and a standard deviation of 1  The proportion of under the curve up to or beyond any specific value of Z can now be obtained from a published table with

Standard Normal Probability Distribution (II) with 0-1,96+1,96 =1=1 2,5% of cases 95% of cases A standard normal distribution is a normal distribution N(0,1) with mean  =0 and standard deviation  =1

Student’s t-distribution  Student’s t-distribution is very similar to the standard normal Z- distribution, but adjust for the degrees of freedom (df)  As the sample size N tends to infinity the t-distribution approximates the standard normal Z-distribution  We know the proportion of cases below a certain t-value, e.g. 2.5% of the cases are below t=1.98 for N-1=120 and t=1.96 when N approaches infinity

Confidence Intervals  We now come back to the question asked before: how good are our estimates of some parameters obtained by the sample? How good an estimator is, say, the sample mean, X, of the what we really want to know, which is the population mean μ?  The sample mean can be taken as an estimate of the unknown population mean  Though correct on average, a single estimate from an individual sample might differ from the true mean to some extent  We can generate an interval in which the "true" (population) mean is located with a specified probability 90% CI: With a probability of 90%, the interval includes m 95% CI: In 95 times out of 100, the interval includes m 99% CI: There is a 99% probability that the interval includes m

Confidence Intervals (II)  How many standard errors either side of the sample do we have to add to achieve a degree of confidence of 95%?  The t-distribution gives the exact value!  We know the proportion of cases below a certain t-value, e.g. 2.5% of the cases are below t=1.98 for N-1=120 and t=1.96 when N approaches infinity  Example: Birth rate in English parishes N= 214 parishes The mean is births per 100 families Standard error (SE) is births The t value for the t-distribution for 213 degrees of freedom is  The 95% confidence interval for the mean birth rate of the population is therefore: /­ (1.971 x 0.308) = /­ 0.607

Repetition & Confidence Intervals Computer Class:

Exercises Weimar elections: Unemployment and votes for the Nazi Get the dataset about the Weimar election of at Look at the variables (votes for the Nazi party, level of unemployment) in turn Get a first visualisation of the data; does it look normally distributed? Compute the mean, median, standard deviation, coefficient of variation, kurtosis and skewness for voting share of the Nazi party and the level of unemployment Estimate the following regression for each of the first two of the four elections (09/30, 03/33): Nazi=a+bUnemployment Explain in words what the two regression tell you Draw the respective scatter plots and draw the regression lines Calculate the 90%, 95% and 99% confidence intervals for a and b Are the b and the explanatory power of the regression the same for the election in 1930 and the one in 1933?

Homework  Readings: Feinstein & Thomas, Ch. 6 Repeat what we have learned today  Problem Set 5:  Finish the exercises from today’s computer class if you haven’t done so already. Include all the results and answers in the file you send me.