Statistics Review – Part I

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.

PDAs Accept Context-Free Languages
© 2010 Pearson Prentice Hall. All rights reserved
/ /17 32/ / /
Lecture 8: Hypothesis Testing
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
Introductory Mathematics & Statistics for Business
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Summative Math Test Algebra (28%) Geometry (29%)
Introduction to Turing Machines
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
This morning’s programme
ASCII stands for American Standard Code for Information Interchange
Quantitative Methods Lecture 3
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Sampling in Marketing Research
The basics for simulations
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
Putting Statistics to Work
Ch. 8: Confidence Interval Estimation
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Statistical Inferences Based on Two Samples
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Biostatistics course Part 14 Analysis of binary paired data
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Hypothesis Testing Using a Single Sample
© 2010 Pearson Prentice Hall. All rights reserved Two Sample Hypothesis Testing for Means from Independent Groups.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Chapter 10 Inferences on Two Samples 10.1 Inference about Means: Dependent Sampling.
Inferences on Two Samples
Chapter 9 Hypothesis Testing
STATISTICS INFORMED DECISIONS USING DATA
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Statistics Review – Part I Topics Z-values Confidence Intervals Hypothesis Testing Paired Tests T-tests F-tests

Statistics References References used in class slides: Sullivan III, Michael. Statistics: Informed Decisions Using Data, Pearson Education, 2004. Gitlow, et. al Six Sigma for Green Belts and Champions, Prentice Hall, 2004.

Sampling and the Normal Distribution Relative frequency histograms that are symmetric and bell-shaped are said to have the shape of a normal curve.

Sampling and the Normal Distribution If a continuous random variable is normally distributed or has a normal probability distribution, then a relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric).

Sampling and the Normal Distribution

Sampling and the Normal Distribution Suppose that the mean normal sugar level in the population is 0=9.7mmol/L with std. dev. =2.0mmol/L - you want to see whether diabetics have increased blood sugar level Sample n=64 individuals with diabetes mean is 0=13.7mmol/L with std. dev. =2.0mmol/L How do you compare these values? Standardize!

Sampling and the Normal Distribution Reading z-scores

Sampling and the Normal Distribution Standardization: Using Z-tables to evaluate sample means Puts samples on the same scale Subtract mean and divide by standard deviation

Sampling and the Normal Distribution Why do we standardize? Enables the comparison of populations/ samples using a standardized set of values Recall

Sampling and the Normal Distribution The table gives the area under the standard normal curve for values to the left of a specified Z-score, zo, as shown in the figure.

Sampling and the Normal Distribution

Sampling and the Normal Distribution Population Mean=10, Standard Deviation=5 What is the likelihood of a sample (n=16) having a mean greater than 12 (standard deviation = 5)? What is the likelihood of a sample (n=16) having a mean of less than 8 (standard deviation = 5)?

Sampling and the Normal Distribution Notation for the Probability of a Standard Normal Random Variable: P(a < Z < b) represents the probability a standard normal random variable is between a and b P(Z > a) represents the probability a standard normal random variable is greater than a. P(Z < a) represents the probability a standard normal random variable is less than a.

Sampling and the Normal Distribution Before using Z-tables, need to assess whether the data is normally distributed Different ways Histogram Probability plot

Sampling and the Normal Distribution Normal Probability Plots:

Sampling and the Normal Distribution Normal Probability Plots: Fat pencil test to detect normality

Sampling and the Normal Distribution Shapes of Normal Probability Plots:

Sampling and the Normal Distribution Normal Probability Plots vs Box plots:

Sampling and the Normal Distribution If distribution of data is “approximately” normally distributed, use Z-tables to determine likelihood of events

Sampling and the Normal Distribution Can also “flip” Z-scores to determine the ‘highest’ or ‘lowest’ acceptable sample mean

Confidence Intervals Point estimate: value of a statistic that estimates the value of the parameter. Confidence interval estimate: interval of numbers along with a probability that the interval contains the unknown parameter. Level of confidence: a probability that represents the percentage of intervals that will contain if a large number of repeated samples are obtained.

Confidence Intervals A 95% level  if 100 confidence intervals were constructed, each based on a different sample from the same population, we would expect 95 of the intervals to contain the population mean. The construction of a confidence interval for the population mean depends upon three factors: The point estimate of the population The level of confidence The standard deviation of the sample mean:

Confidence Intervals If a simple random sample from a population is normally distributed or the sample size is large, the distribution of the sample mean will be normal with:

Confidence Intervals

95% of all sample means are in the interval: Confidence Intervals 95% of all sample means are in the interval: With a little algebraic manipulation, we can rewrite this inequality and obtain:

Confidence Intervals

Confidence Intervals Steps to constructing a confidence interval: Verify normality if n<=30. Determine /2, x-bar, . Find z-score for /2. Calculate upper and lower bound.

Confidence Intervals

Confidence Intervals

Confidence Intervals Histogram for z

Confidence Intervals Histogram for t

Confidence Intervals

Properties of the t Distribution Confidence Intervals Properties of the t Distribution The t distribution is different for different values of n. 2. The t distribution is centered at 0 and is symmetric about 0. 3. The area under the curve is 1. The area under the curve to the right of 0 = the area under the curve to the left of 0 = 1 / 2. 4. As t increases and decreases without bound, the graph approaches, but never equals, zero. The area in the tails of the t distribution is a little greater than the area in the tails of the standard normal distribution. This is due to using s as an estimate introducing more variability to the t statistic. As the sample size n increases, the density of the curve of t approaches the standard normal density curve. The occurs due to the values of s approaching the values of sigma by the law of large numbers.

Confidence Intervals

Confidence Intervals

Confidence Intervals EXAMPLE: Finding t-values Find the t-value such that the area under the t distribution to the right of the t-value is 0.2 assuming 10 degrees of freedom. Hint: find t0.20 with 10 degrees of freedom.

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals

Confidence Intervals EXAMPLE: Finding Chi-Square Values Find the chi-square values that separate the middle 95% of the distribution from the 2.5% in each tail. Assume 18 degrees of freedom.

Confidence Intervals

Confidence Intervals EXAMPLE: Constructing a Confidence Interval about a Population Standard Deviation

Hypothesis Testing Hypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of one or more populations. Selecting Hypothesis Testing methods – see next slides.

Hypothesis Testing The null hypothesis, denoted Ho (read “H-naught”), is a statement to be tested. The null hypothesis is assumed true until evidence indicates otherwise. In this chapter, it will be a statement regarding the value of a population parameter. The alternative hypothesis, denoted, H1 (read “H-one”), is a claim to be tested. We are trying to find evidence for the alternative hypothesis. In this chapter, it will be a claim regarding the value of a population parameter.

Hypothesis Testing There are three ways to set up the null and alternative hypothesis: 1. Equal versus not equal hypothesis (two-tailed test) Ho: parameter = some value H1: parameter  some value 2. Equal versus less than (left-tailed test) H1: parameter < some value 3. Equal versus greater than (right-tailed test) H1: parameter > some value

Hypothesis Testing THREE WAYS TO STRUCTURE THE HYPOTHESIS TEST:

Hypothesis Testing Two-tailed test

Hypothesis Testing One-Tailed Test

Hypothesis Testing Four Outcomes from Hypothesis Testing 1. We could reject Ho when in fact H1 is true. This would be a correct decision. 2. We could not reject Ho when in fact Ho is true. This would be a correct decision. 3. We could reject Ho when in fact Ho is true. This would be an incorrect decision. This type of error is called a Type I error. 4. We could not reject Ho when in fact H1 is true. This would be an incorrect decision. This type of error is called a Type II error.

For example, we might reject the null hypothesis if the sample mean is more than 2 standard deviations above the population mean. Why? z 0 1 2 Area = 0.0228

Hypothesis Testing If the null hypothesis is true, then 1 - 0.0228 = 0.9772 = 97.72% of all sample means will be less than

Hypothesis Testing Because sample means greater than 2.88 are unusual if the population mean is 2.62, we are inclined to believe the population mean is greater than 2.62.

Hypothesis Testing

Hypothesis Testing Step 1: A claim is made regarding the population mean. The claim is used to determine the null and alternative hypotheses. Again, the hypothesis can be structured in one of three ways:

Hypothesis Testing

Hypothesis Testing The critical region or rejection region is the set of all values such that the null hypothesis is rejected.

Hypothesis Testing

Hypothesis Testing Step 4: Compare the critical value with the test statistic: Step 5: State the conclusion.

Hypothesis Testing A P-value is the probability of observing a sample statistic as extreme or more extreme than the one observed under the assumption the null hypothesis is true.

Hypothesis Test Regarding μ with σ Known Hypothesis Testing Hypothesis Test Regarding μ with σ Known (P-values)

Hypothesis Testing Step 1: A claim is made regarding the population mean. The claim is used to determine the null and alternative hypotheses. Again, the hypothesis can be structured in one of three ways:

Hypothesis Testing Step 3: Compute the P-value.

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing

Properties of the t Distribution Hypothesis Testing Properties of the t Distribution The t distribution is different for different values of n, the sample size. The t distribution is centered at 0 and is symmetric about 0. The area under the curve is 1. Because of the symmetry, the area under the curve to the right of 0 equals the area under the curve to the left of 0 equals ½. As t increases without bound, the graph approaches, but never equals, zero. As t decreases without bound the graph approaches, but never equals, zero. The area in the tails of the t distribution is a little greater than the area in the tails of the standard normal distribution. This result is because we are using s as an estimate of which introduces more variability to the t statistic.

Hypothesis Testing

Hypothesis Testing Step 1: A claim is made regarding the population mean. The claim is used to determine the null and alternative hypotheses. Again, the hypothesis can be structured in one of three ways:

Hypothesis Testing

Hypothesis Testing Step 3: Compute the test statistic which follows Student’s t-distribution with n – 1 degrees of freedom.

Hypothesis Testing Step 4: Compare the critical value with the test statistic: Step 5 : State the conclusion.

Hypothesis Testing

Hypothesis Testing

Hypothesis Test Regarding a Population Variance or Standard Deviation Hypothesis Testing Hypothesis Test Regarding a Population Variance or Standard Deviation If a claim is made regarding the population variance or standard deviation, we can use the following steps to test the claim provided (1) the sample is obtained using simple random sampling (2) the population is normally distributed

Step 1: A claim is made regarding the population variance or standard deviation. The claim is used to determine the null and alternative hypothesis. We present the three cases for a claim regarding a population standard deviation.

Hypothesis Testing

Hypothesis Testing

Hypothesis Testing Step 4: Compare the critical value with the test statistic. Step 5: State the conclusion.

Paired Testing A sampling method is independent when the individuals selected for one sample does not dictate which individuals are to be in a second sample. A sampling method is dependent when the individuals selected to be in one sample are used to determine the individuals to be in the second sample. Dependent samples are often referred to as matched pairs samples.

EXAMPLE Independent versus Dependent Sampling Paired Testing EXAMPLE Independent versus Dependent Sampling For each of the following, determine whether the sampling method is independent or dependent. (a) A researcher wants to know whether the price of a one night stay at a Holiday Inn Express Hotel is less than the price of a one night stay at a Red Roof Inn Hotel. She randomly selects 8 towns where the location of the hotels is close to each other and determines the price of a one night stay. (b) A researcher wants to know whether the newly issued “state” quarters have a mean weight that is different from “traditional” quarters. He randomly selects 18 “state” quarters and 16 “traditional” quarters. Their weights are compared.

Paired Testing In order to test the hypotheses regarding the mean difference, we need certain requirements to be satisfied. A simple random sample is obtained The sample data is matched pairs The differences are normally distributed or the sample size, n, is large (n > 30).

Paired Testing

Paired Testing

Paired Testing

Paired Testing Step 4: Compare the critical value with the test statistic: Step 5 : State the conclusion.

T-Tests

T-Tests

T-Tests

T-Tests

T-Tests

T-Tests Step 4: Compare the critical value with the test statistic: Step 5 : State the conclusion.

T-Tests The degrees of freedom used to determine the critical value(s) presented in the last example are conservative. Results that are more accurate can be obtained by using the following degrees of freedom:

Lower Bound = Upper Bound =

F-Tests Requirements for Testing Claims Regarding Two Population Standard Deviations 1. The samples are independent simple random samples. 2. The populations from which the samples are drawn are normally distributed.

Fisher's F-distribution

Characteristics of the F-distribution F-Tests Characteristics of the F-distribution 1. It is not symmetric. The F-distribution is skewed right. 2. The shape of the F-distribution depends upon the degrees of freedom in the numerator and denominator. This is similar to the distribution and Student’s t-distribution, whose shape depends upon their degrees of freedom. 3. The total area under the curve is 1. 4. The values of F are always greater than or equal to zero.

F-Tests

F-Tests Is the critical F with n1 – 1 degrees of freedom in the numerator and n2 – 1 degrees of freedom in the denominator and an area of  to the right of the critical F. To find the critical F with an area of α to the left, use the following:

Hypothesis Test Regarding the Two Means Population Standard Deviations F-Tests Hypothesis Test Regarding the Two Means Population Standard Deviations

F-Tests

F-Tests

F-Tests