Statistical inference: confidence intervals and hypothesis testing
Objective The objective of this session is Inference statistic Sampling theory Estimate and confidence intervals Hypothesis testing
Statistical analysis Descriptive calculate various type of descriptive statistics in order to summarize certain quality of the data Inferential use information gained from the descriptive statistics of sample data to generalize to the characteristics of the whole population
Inferential statistic application 2 broad areas Estimation create confidence intervals to estimate the true population parameter Hypothesis testing test the hypotheses that the population parameter has a specified range
Population & Sample populationsample mean: standard deviation:
Sampling theory When working with the samples of data we have to rely on sampling theory to give us the probability distribution pertaining to the particular sample statistics This probability distribution is known as “the sampling distribution”
Sampling distributions Assume there is a population … Population size N=4 Random variable, X, is age of individuals Values of X: 18, 20, 22, 24 measured in years A B C D
Sampling distributions Summary measures for the Population Distribution A B C D (18) (20) (22) (24) Population mean Distribution P(X)
Sampling distributions Summary measures of sampling distribution
Properties of summary measures Sampling distribution of the sample arithmetic mean Sampling distribution of the standard deviation of the sample means
Estimation and confidence intervals Estimation of the population parameters: point estimates confidence intervals or interval estimators Confidence intervals for: Means Variance Large or Small samples ???
Confidence intervals for means large samples (n >= 30) apply Z-distribution Probability distribution confidence interval
Confidence intervals for means large samples (n >= 30) From the normally distributed variable, 95% of the observations will be plus or minus 1.96 standard deviations of the mean
Confidence intervals for means large samples (n >= 30) The confident interval is given as 95% confidence interval SE+1.96 SE Probability distribution 2.5% in tail
Confidence intervals for means large samples (n >= 30) 95% confidence interval SE+1.96 SE Probability distribution 2.5% in tail
Confidence intervals for means large samples (n >= 30) Thus, we can state that: “the sample mean will lie within an interval plus or minus 1.95 standard errors of the population mean 95% of the time”
Confidence intervals for means large samples (n >= 30) Example we have data on 60 monthly observations of the returns to the SET 100 index. The sample mean monthly return is 1.125% with a standard deviation of 2.5%. What is the 95% confidence interval mean ???
Confidence intervals for means large samples (n >= 30) Example (cont’d) Standard error is calculated as the confidence interval would be The probability statement would be
Confidence intervals for means large samples (n >= 30) Example (cont’d) The probability statement would be How does the analyst use this information ???
Confidence intervals for means What about small samples (n < 30) apply t-distribution Probability distribution
Confidence intervals for means What about small sample ??? (n < 30) Apply t-distribution The confidence interval becomes The probability statement pertaining to this confidence interval is
Confidence intervals for means Example From 20 observations, the sample mean is calculated as 4.5%. The sample standard deviation is 5%. At the 95% level of confidence: the confidence interval is … the probability statement is …
Confidence intervals for variances Apply a distribution The confidence interval is given as The probability statement pertaining to this confidence interval is
Confidence intervals for variances Example From a sample of 30 monthly observations the variance of the FTSE 100 index is With n-1 = 29 degrees of freedom (leaving 2.5% level of significant in each tail) the confidence interval is … the probability statement is …
Hypothesis testing 2 Broad approaches Classical approach P-value approach is an assumption about the value of a population parameter of the probability distribution under consideration
Hypothesis testing When testing, 2 hypotheses are established the null hypothesis the alternative hypothesis The exact formulation of the hypothesis depends upon what we are trying to establish e.g. we wish to know whether or not a population parameter,, has a value of
Hypothesis testing How about we wish to know whether or not a population parameter,, is greater than a given figure, the hypothesis would then be … And if we wish to know whether or not a population parameter is greater than a given figure, the hypothesis would then be …
The standardized test statistic In hypothesis testing we have to standardizing the test statistic so that the meaningful comparison can be made with the Standard normal (z-distribution) t-distribution distribution The hypothesis test may be One-tailed test Two-tailed test MEAN VARIANCE
Hypothesis test of the population mean Two-tailed test of the mean Set up the hypotheses as Decide on the level of significance for the test (10, 5, 1% level etc.) and establish 5, 2.5, 0.5% in each tail Set the value of in the null hypothesis Identify the appropriate critical value of z (or t) from the tables (reflect the percentages in the tails according to the level of significance chosen)
Hypothesis test of the population mean Two-tailed test of the mean Applying the following decision rule: Accept H 0 if Reject H 0 if otherwise
Hypothesis test of the population mean Example Consider a test of whether or not the mean of a portfolio manager’s monthly returns of 2.3% is statistically significantly different from the industry average of 2.4%. (from 36 observations with a standard deviation of 1.7%)
Hypothesis test of the population mean Example An analyst claims that the average annual rate of return generated by a technical stock selection service is 15% and recommends that his firm use the services as an input for its research product. The analyst’s supervisor is skeptical of this claim and decides to test its accuracy by randomly selecting 16 stocks covered by the service and computing the rate of return that would have been earned by following the service’s recommendations with regards to them over the previous 10-year period. The result of this sample are as follows: The average annual rate of return produced by following the service’s advice on the 16 sample stocks over the past 10 years was 11% The standard deviation in these sample results was 9% Determine whether or not the analyst’s claim should be accepted or rejected at the 5% level of significant ???
Hypothesis test of the population mean One-tailed test of the mean (Right-tailed tests) Set up the hypotheses as Applying the following decision rule: Accept H 0 if Reject H 0 if
Hypothesis test of the population mean Example If we wish to test that the mean monthly return on the FTSE 100 index for a given period is more than 1.2. From 60 observations we calculate the mean as 1.25% and the standard deviation as 2.5%.
Hypothesis test of the population mean Example We wish to test that the mean monthly return on the S&P500 index is less than 1.30%. Assume also that the mean return from 75 observations is 1.18%, with a standard deviation of 2.2%.
Hypothesis test of the population mean Two-tailed test Applying the following decision rule: Accept H 0 if Reject H 0 if otherwise One-tailed test Applying the following decision rule: Accept H 0 if Reject H 0 if Left or right tailed test ??? How ‘bout the other ???
Hypothesis testing of the variance Two-tailed test The standardized test statistic for the population variance is This standardized test statistic has a distribution
Hypothesis testing of the variance Example If we wish to test the variance of share B is below 25. The sample variance is 23 and the number of observation is 40
The p-value method of hypothesis testing The p-value is the lowest level of significance at which the null hypothesis is rejected If the p-value ≥ the level of significance (α) accept null hypothesis If the p-value < the level of significance (α) reject null hypothesis
Calculation the p-value If we wish to find an investment give at least 13.2%. Assume that the mean annualized monthly return of a given bond index is 14.4% and the sample standard deviation of those return is 2.915%, there were 30 observations an the returns are normally distributed.
Calculation the p-value The test statistic is: With degree of freedom = 29 a t-value of leaves 2.5% in the tail a t-value of leaves 1% in the tail
Calculation the p-value Calculate p-value from interpolation P-value = – (0.50 x (0.025 – 0.01) = = 1.75% P-value (1.75%) < α (5%), thus reject null hypothesis
Conclusion Meaning of statistical inference Sampling theory Application of statistical inference Confidence intervals Estimation Hypothesis testing Two-tailed One-tailed means variance Z-distribution t-distribution X 2 -distribution
Conclusion Under the following circumstances:The Appropriate Reliability Factor for Determining Confidence Intervals for a Population Mean is: 1. The data in the population are normally distributed with a known standard deviation. Z-value 2. The data in the population are normally distributed, there standard deviation is unknown, but can estimated from sample data. T-value However, a Z-value can be used as an approximation of the t- value, if the sample is large. 3. The data in the population are not normally distributed, there standard deviation is known, and the sample size is large. Z-value 4. The data in the population are not normally distributed, there standard deviation is known, and the sample size is large. No good reliability factor exists