Introduction to Basic Statistical Methods Part 1: Statistics in a Nutshell UWHC Scholarly Forum May 21, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Is it statistically significant?
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
Hypothesis Testing Using a Single Sample
Business Statistics for Managerial Decision
Topic 2: Statistical Concepts and Market Returns
Click on image for full.pdf article Links in article to access datasets.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Chapter 11: Inference for Distributions
8-5 Testing a Claim About a Standard Deviation or Variance This section introduces methods for testing a claim made about a population standard deviation.
Chapter 9 Hypothesis Testing.
UWHC Scholarly Forum April 17, 2013 Ismor Fischer, Ph.D. UW Dept of Statistics, UW Dept of Biostatistics and Medical Informatics
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Power and Sample Size IF IF the null hypothesis H 0 : μ = μ 0 is true, then we should expect a random sample mean to lie in its “acceptance region” with.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Overview Definition Hypothesis
II.Simple Regression B. Hypothesis Testing Calculate t-ratios and confidence intervals for b 1 and b 2. Test the significance of b 1 and b 2 with: T-ratios.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
1 G Lect 5b G Lecture 5b A research question involving means The significance test approach »The problem of s 2 »Student’s t distribution.
Slide 23-1 Copyright © 2004 Pearson Education, Inc.
Education 793 Class Notes T-tests 29 October 2003.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
LECTURE 21 THURS, 23 April STA 291 Spring
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
1 Objective Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means.
Student’s t-distributions. Student’s t-Model: Family of distributions similar to the Normal model but changes based on degrees-of- freedom. Degrees-of-freedom.
Chapter 11 Inference for Distributions AP Statistics 11.1 – Inference for the Mean of a Population.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Significance Tests: THE BASICS Could it happen by chance alone?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
1 Section 9-4 Two Means: Matched Pairs In this section we deal with dependent samples. In other words, there is some relationship between the two samples.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Essential Statistics Chapter 141 Thinking about Inference.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
1 Objective Compare of two population variances using two samples from each population. Hypothesis Tests and Confidence Intervals of two variances use.
Slide Slide 1 Section 8-4 Testing a Claim About a Mean:  Known.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
AP Statistics.  If our data comes from a simple random sample (SRS) and the sample size is sufficiently large, then we know that the sampling distribution.
Section 6.2 Confidence Intervals for the Mean (Small Samples) Larson/Farber 4th ed.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
MATB344 Applied Statistics I. Experimental Designs for Small Samples II. Statistical Tests of Significance III. Small Sample Test Statistics Chapter 10.
Introduction to Basic Statistical Methods Part 1: “Statistics in a Nutshell” UWHC Scholarly Forum March 19, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics.
Testing a Single Mean Module 16. Tests of Significance Confidence intervals are used to estimate a population parameter. Tests of Significance or Hypothesis.
6.3 One- and Two- Sample Inferences for Means. If σ is unknown Estimate σ by sample standard deviation s The estimated standard error of the mean will.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Section 6.2 Confidence Intervals for the Mean (Small Samples) © 2012 Pearson Education, Inc. All rights reserved. 1 of 83.
16/23/2016Inference about µ1 Chapter 17 Inference about a Population Mean.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Lecture Nine - Twelve Tests of Significance.
Introduction to Statistics for Engineers
CHAPTER 6 Statistical Inference & Hypothesis Testing
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Chapter 6 Confidence Intervals.
STAT Z-Tests and Confidence Intervals for a
Statistical Inference for the Mean: t-test
Presentation transcript:

Introduction to Basic Statistical Methods Part 1: Statistics in a Nutshell UWHC Scholarly Forum May 21, 2014 Ismor Fischer, Ph.D. UW Dept of Statistics Part 2: Overview of Biostatistics: “Which Test Do I Use??” All slides posted at

Right-cick on image for full.pdf article Links in article to access datasets

Study Question: Has mean (i.e., average) of X = “Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? Present Day: Assume X = “Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X “Statistical Inference” POPULATION

Study Question: Has mean (i.e., average) of X = “Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? Present Day: Assume X = “Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X POPULATION “Statistical Inference”

~ The Normal Distribution ~  symmetric about its mean  unimodal (i.e., one peak), with left and right “tails”  models many (but not all) naturally-occurring systems  useful mathematical properties… “population mean” “population standard deviation” 

~ The Normal Distribution ~ “population standard deviation”  symmetric about its mean  unimodal (i.e., one peak), with left and right “tails”  models many (but not all) naturally-occurring systems Approximately 95% of the population values are contained between  – 2 σ and  + 2 σ. 95% is called the confidence level. 5% is called the significance level. 95% 2.5% ≈ 2 σ “population mean”   useful mathematical properties…

POPULATION Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X “Null Hypothesis” H 0 : pop mean age  = 25.4 (i.e., no change since 2010) via… “Hypothesis Testing”  cannot be found with 100% certainty, but can be estimated with high confidence (e.g., 95%) from sample data. Sample size n partially depends on the power of the test, i.e., the desired probability of correctly rejecting a false null hypothesis (  80%).

POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age sample variance

sample standard deviation POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s = 1.6

POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s Is the difference STATISTICALLY SIGNIFICANT, at the 5% level? Do the data tend to support or refute the null hypothesis? The population distribution of X follows a bell curve, with standard deviation . = 1.6

POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s Is the difference STATISTICALLY SIGNIFICANT, at the 5% level? Do the data tend to support or refute the null hypothesis? The “sampling distribution” of also follows a bell curve, with standard deviation  / = 1.6

POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s Is the difference STATISTICALLY SIGNIFICANT, at the 5% level? Do the data tend to support or refute the null hypothesis? But estimating  by s introduces an additional layer of “sampling variability.” = 1.6

POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s Is the difference STATISTICALLY SIGNIFICANT, at the 5% level? Do the data tend to support or refute the null hypothesis? In order to take this into account, a cousin to the normal distribution called the “T-distribution” is used instead (Gossett, 1908). = 1.6

t1t1 “standard” bell curve:  = 0,  = 1 t df Student’s T-Distribution William S. Gossett ( ) … is actually a family of distributions, indexed by the degrees of freedom df = n – 1, labeled t df. As n gets large, t df converges to the standard normal distribution. But the heavier tails mean a wider interval is needed to capture 95%, especially if n is small.

T-test POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s Is the difference STATISTICALLY SIGNIFICANT, at the 5% level? Do the data tend to support or refute the null hypothesis? In order to take this into account, a cousin to the normal distribution called the “T-distribution” is used instead (Gossett, 1908). = 1.6

POPULATION “Null Hypothesis” via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… H 0 : pop mean age  = 25.4 (i.e., no change since 2010) “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X sample mean age  s Is the difference STATISTICALLY SIGNIFICANT, at the 5% level? Do the data tend to support or refute the null hypothesis? T-test = 1.6

95% CONFIDENCE INTERVAL FOR µ “P-VALUE” of our sample Very informally, the p-value of a sample is the probability (hence a number between 0 and 1) that it “agrees” with the null hypothesis. Hence a very small p-value indicates strong evidence against the null hypothesis. The smaller the p-value, the stronger the evidence, and the more “statistically significant” the finding (e.g., p <.0001). Two main ways to conduct a formal hypothesis test: BASED ON OUR SAMPLE DATA, the true value of μ today is between and years, with 95% “confidence” (…akin to “probability”).

Hence a very small p-value indicates strong evidence against the null hypothesis. The smaller the p-value, the stronger the evidence, and the more “statistically significant” the finding (e.g., p <.0001). Very informally, the p-value of a sample is the probability (hence a number between 0 and 1) that it “agrees” with the null hypothesis BASED ON OUR SAMPLE DATA, the true value of μ today is between and years, with 95% “confidence” (…akin to “probability”). 95% CONFIDENCE INTERVAL FOR µ IF H 0 is true, then we would expect a random sample mean that is at least 0.2 years away from  = 25.4 (as ours was), to occur with probability 1.28%. Two main ways to conduct a formal hypothesis test: “P-VALUE” of our sample FORMAL CONCLUSIONS:  The 95% confidence interval corresponding to our sample mean does not contain the “null value” of the population mean, μ = 25.4 years.  The p-value of our sample,.0128, is less than the predetermined α =.05 significance level. Based on our sample data, we may (moderately) reject the null hypothesis H 0 : μ = 25.4 in favor of the two-sided alternative hypothesis H A : μ ≠ 25.4, at the α =.05 significance level. INTERPRETATION: According to the results of this study, there exists a statistically significant difference between the mean ages at first birth in 2010 (25.4 years old) and today, at the 5% significance level. Moreover, the evidence from the sample data would suggest that the population mean age today is significantly older than in 2010, rather than significantly younger. FORMAL CONCLUSIONS:  The 95% confidence interval corresponding to our sample mean does not contain the “null value” of the population mean, μ = 25.4 years.  The p-value of our sample,.0128, is less than the predetermined α =.05 significance level. Based on our sample data, we may (moderately) reject the null hypothesis H 0 : μ = 25.4 in favor of the two-sided alternative hypothesis H A : μ ≠ 25.4, at the α =.05 significance level. INTERPRETATION: According to the results of this study, there exists a statistically significant difference between the mean ages at first birth in 2010 (25.4 years old) and today, at the 5% significance level. Moreover, the evidence from the sample data would suggest that the population mean age today is significantly older than in 2010, rather than significantly younger.

Edited R code: y = rnorm(400, 0, 1) z = (y - mean(y)) / sd(y) x = *z sort(round(x, 1)) [1] [16] etc... [391] c(mean(x), sd(x)) [1] t.test(x, mu = 25.4) One Sample t-test data: x t = 2.5, df = 399, p-value = alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x 25.6 t.test(x, mu = 25.4) One Sample t-test data: x t = 2.5, df = 399, p-value = alternative hypothesis: true mean is not equal to percent confidence interval: sample estimates: mean of x 25.6 Generates a normally-distributed random sample of 400 age values. Calculates sample mean and standard deviation.

POPULATION Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? x1x1 x4x4 x3x3 x2x2 x5x5 x 400 … etc… “Statistical Inference” Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X H 0 : pop mean age  = 25.4 (i.e., no change since 2010) via… “Hypothesis Testing” Assume The reasonableness of the normality assumption is empirically verifiable, and in fact formally testable from the sample data. If violated (e.g., skewed) or inconclusive (e.g., small sample size), then “distribution- free” nonparametric tests should be used instead of the T-test… Examples: Sign Test, Wilcoxon Signed Rank Test (= Mann-Whitney U Test)