Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

“Students” t-test.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Topic 2: Statistical Concepts and Market Returns
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Inference about a Mean Part II
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Chapter 9: Introduction to the t statistic
Inferential Statistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview Definition Hypothesis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Chapter 9 Tests of Hypothesis Single Sample Tests The Beginnings – concepts and techniques Chapter 9A.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
MeanVariance Sample Population Size n N IME 301. b = is a random value = is probability means For example: IME 301 Also: For example means Then from standard.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Statistical Inference Making decisions regarding the population base on a sample.
© Copyright McGraw-Hill 2004
Statistical Inference Making decisions regarding the population base on a sample.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
The p-value approach to Hypothesis Testing
Chapter 13 Understanding research results: statistical inference.
Statistical Inference Making decisions regarding the population base on a sample.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Chapter 9 Introduction to the t Statistic
Virtual University of Pakistan
Chapter 9 Hypothesis Testing.
Chapter 4. Inference about Process Quality
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
CONCEPTS OF HYPOTHESIS TESTING
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Statistical Inference
Confidence Intervals.
The z-test for the Mean of a Normal Population
Presentation transcript:

Statistical Decision Making

Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from some phenomena, a decision will have to be made about the phenomena

Decisions are generally broken into two types: Estimation decisions and Hypothesis Testing decisions.

Probability Theory plays a very important role in these decisions and the assessment of error made by these decisions

Definition: A random variable X is a numerical quantity that is determined by the outcome of a random experiment

Example : An individual is selected at random from a population and X = the weight of the individual

The probability distribution of a random variable (continuous) is describe by: its probability density curve f(x).

i.e. a curve which has the following properties : 1. f(x) is always positive. 2. The total are under the curve f(x) is one. 3. The area under the curve f(x) between a and b is the probability that X lies between the two values.

Examples of some important Univariate distributions

1.The Normal distribution A common probability density curve is the “Normal” density curve - symmetric and bell shaped Comment: If  = 0 and  = 1 the distribution is called the standard normal distribution Normal distribution with  = 50 and  =15 Normal distribution with  = 70 and  =20

2.The Chi-squared distribution with degrees of freedom

Comment: If z 1, z 2,..., z are independent random variables each having a standard normal distribution then U = has a chi-squared distribution with degrees of freedom.

3. The F distribution with  degrees of freedom in the numerator and  degrees of freedom in the denominator if x  0 where K =

Comment: If U 1 and U 2 are independent random variables each having Chi-squared distribution with 1 and 2 degrees of freedom respectively then F = has a F distribution with  degrees of freedom in the numerator and  degrees of freedom in the denominator

4.The t distribution with degrees of freedom where K =

Comment: If z and U are independent random variables, and z has a standard Normal distribution while U has a Chi- squared distribution with degrees of freedom then t = has a t distribution with degrees of freedom.

The Sampling distribution of a statistic

A random sample from a probability distribution, with density function f(x) is a collection of n independent random variables, x 1, x 2,...,x n with a probability distribution described by f(x).

If for example we collect a random sample of individuals from a population and –measure some variable X for each of those individuals, –the n measurements x 1, x 2,...,x n will form a set of n independent random variables with a probability distribution equivalent to the distribution of X across the population.

A statistic T is any quantity computed from the random observations x 1, x 2,...,x n.

Any statistic will necessarily be also a random variable and therefore will have a probability distribution described by some probability density function f T (t). This distribution is called the sampling distribution of the statistic T.

This distribution is very important if one is using this statistic in a statistical analysis. It is used to assess the accuracy of a statistic if it is used as an estimator. It is used to determine thresholds for acceptance and rejection if it is used for Hypothesis testing.

Some examples of Sampling distributions of statistics

Distribution of the sample mean for a sample from a Normal popululation Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let

Than has a normal sampling distribution with mean and standard deviation

Distribution of the z statistic Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let Then z has a standard normal distibution

Comment: Many statistics T have a normal distribution with mean  T and standard deviation  T. Then will have a standard normal distribution.

Distribution of the  2 statistic for sample variance Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let = sample variance and = sample standard deviation

Let Then  2 has chi-squared distribution with = n-1 degrees of freedom.

The chi-squared distribution

Distribution of the t statistic Let x 1, x 2,...,x n is a sample from a normal population with mean  and standard deviation  Let then t has student’s t distribution with = n-1 degrees of freedom

Comment: If an estimator T has a normal distribution with mean  T and standard deviation  T. If s T is an estimatior of  T based on degrees of freedom Then will have student’s t distribution with degrees of freedom.

t distribution standard normal distribution

Point estimation A statistic T is called an estimator of the parameter  if its value is used as an estimate of the parameter . The performance of an estimator T will be determined by how “close” the sampling distribution of T is to the parameter, , being estimated.

An estimator T is called an unbiased estimator of  if  T, the mean of the sampling distribution of T satisfies  T = . This implies that in the long run the average value of T is .

An estimator T is called the Minimum Variance Unbiased estimator of  if T is an unbiased estimator and it has the smallest standard error  T amongst all unbiased estimators of . If the sampling distribution of T is normal, the standard error of T is extremely important. It completely describes the variability of the estimator T.

Interval Estimation (confidence intervals) Point estimators give only single values as an estimate. There is no indication of the accuracy of the estimate. The accuracy can sometimes be measured and shown by displaying the standard error of the estimate.

There is however a better way. Using the idea of confidence interval estimates The unknown parameter is estimated with a range of values that have a given probability of capturing the parameter being estimated.

The interval T L to T U is called a (1 -  )  100 % confidence interval for the parameter , if the probability that  lies in the range T L to T U is equal to 1 -  Here, T L to T U, are –statistics –random numerical quantities calculated from the data.

Examples Confidence interval for the mean of a Normal population (based on the z statistic). is a (1 -  )  100 % confidence interval for , the mean of a normal population. Here z  /2 is the upper  /2  100 % percentage point of the standard normal distribution.

More generally if T is an unbiased estimator of the parameter  and has a normal sampling distribution with known standard error  T then is a (1 -  )  100 % confidence interval for .

Confidence interval for the mean of a Normal population (based on the t statistic). is a (1 -  )  100 % confidence interval for , the mean of a normal population. Here t  /2 is the upper  /2  100 % percentage point of the Student’s t distribution with = n-1 degrees of freedom.

More generally if T is an unbiased estimator of the parameter  and has a normal sampling distribution with estmated standard error s T, based on n degrees of freedom, then is a (1 -  )  100 % confidence interval for .

Common Confidence intervals

Multiple Confidence intervals In many situations one is interested in estimating not only a single parameter, , but a collection of parameters,  1,  2,  3,.... A collection of intervals, T L1 to T U1, T L2 to T U2, T L3 to T U3,... are called a set of (1 -  )  100 % multiple confidence intervals if the probability that all the intervals capture their respective parameters is 1 - 

Hypothesis Testing Another important area of statistical inference is that of Hypothesis Testing. In this situation one has a statement (Hypothesis) about the parameter(s) of the distributions being sampled and one is interested in deciding whether the statement is true or false.

In fact there are two hypotheses –The Null Hypothesis (H 0 ) and –the Alternative Hypothesis (H A ). A decision will be made either to –Accept H 0 (Reject H A ) or to –Reject H 0 (Accept H A ). The following table gives the different possibilities for the decision and the different possibilities for the correctness of the decision

The following table gives the different possibilities for the decision and the different possibilities for the correctness of the decision Accept H 0 Reject H 0 H 0 is true Correct Decision Type I error H 0 is false Type II error Correct Decision

Type I error - The Null Hypothesis H 0 is rejected when it is true. The probability that a decision procedure makes a type I error is denoted by , and is sometimes called the significance level of the test. Common significance levels that are used are  =.05 and  =.01

Type II error - The Null Hypothesis H 0 is accepted when it is false. The probability that a decision procedure makes a type II error is denoted by . The probability 1 -  is called the Power of the test and is the probability that the decision procedure correctly rejects a false Null Hypothesis.

A statistical test is defined by 1. Choosing a statistic for making the decision to Accept or Reject H 0. This statisitic is called the test statistic. 2. Dividing the set of possible values of the test statistic into two regions - an Acceptance and Critical Region.

If upon collection of the data and evaluation of the test statistic, its value lies in the Acceptance Region, a decision is made to accept the Null Hypothesis H 0. If upon collection of the data and evaluation of the test statistic, its value lies in the Critical Region, a decision is made to reject the Null Hypothesis H 0.

The probability of a type I error, , is usually set at a predefined level by choosing the critical thresholds (boundaries between the Acceptance and Critical Regions) appropriately.

The probability of a type II error, , is decreased (and the power of the test, 1 - , is increased) by 1. Choosing the “best” test statistic. 2. Selecting the most efficient experimental design. 3. Increasing the amount of information (usually by increasing the sample sizes involved) that the decision is based.

Some common Tests

The p-value approach to Hypothesis Testing

1.A test statistic 2.A Critical and Acceptance region for the test statistic In hypothesis testing we need The Critical Region is set up under the sampling distribution of the test statistic. Area =  (0.05 or 0.01) above the critical region. The critical region may be one tailed or two tailed

The Critical region:  /2 Accept H 0 Reject H 0

1.Computing the value of the test statistic 2.Making the decision a.Reject if the value is in the Critical region and b.Accept if the value is in the Acceptance region. In test is carried out by

The value of the test statistic may be in the Acceptance region but close to being in the Critical region, or The it may be in the Critical region but close to being in the Acceptance region. To measure this we compute the p-value.

Definition – Once the test statistic has been computed form the data the p-value is defined to be: p-value = P[the test statistic is as or more extreme than the observed value of the test statistic] more extreme means giving stronger evidence to rejecting H 0

Example – Suppose we are using the z –test for the mean m of a normal population and  = Z = p-value = P[the test statistic is as or more extreme than the observed value of the test statistic] = P [ z > 2.3] + P[z < -2.3] = = Thus the critical region is to reject H 0 if Z Suppose the z = 2.3, then we reject H 0

p - value Graph

p-value = P[the test statistic is as or more extreme than the observed value of the test statistic] = P [ z > 1.2] + P[z < -1.2] = = If the value of z = 1.2, then we accept H % chance that the test statistic is as or more extreme than 1.2. Fairly high, hence 1.2 is not very extreme

p - value Graph

Properties of the p -value 1.If the p-value is small (<0.05 or 0.01) H 0 should be rejected. 2.The p-value measures the plausibility of H 0. 3.If the test is two tailed the p-value should be two tailed. 4. If the test is one tailed the p-value should be one tailed. 5.It is customary to report p-values when reporting the results. This gives the reader some idea of the strength of the evidence for rejecting H 0

Multiple testing Quite often one is interested in performing collection (family) of tests of hypotheses. 1. H 0,1 versus H A,1. 2. H 0,2 versus H A,2. 3. H 0,3 versus H A,3. etc.

Let  * denote the probability that at least one type I error is made in the collection of tests that are performed. The value of  *, the family type I error rate, can be considerably larger than , the type I error rate of each individual test. The value of the family error rate,  *, can be controlled by altering the thresholds of each individual test appropriately. A testing procedure of this nature is called a Multiple testing procedure.

Independent variables Dependent Variables CategoricalContinuousContinuous & Categorical Categorical Multiway frequency Analysis (Log Linear Model) Discriminant Analysis Continuous ANOVA (single dep var) MANOVA (Mult dep var) MULTIPLE REGRESSION (single dep variable) MULTIVARIATE MULTIPLE REGRESSION (multiple dependent variable) ANACOVA (single dep var) MANACOVA (Mult dep var) Continuous & Categorical ??