Homework Chapter 11: 13 Chapter 12: 1, 2, 14, 16.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis Testing Steps in Hypothesis Testing:
Chapter 16 Introduction to Nonparametric Statistics
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 16 l Nonparametrics: Testing with Ordinal Data or Nonnormal Distributions.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Nonparametric tests and ANOVAs: What you need to know.
Testing means, part III The two-sample t-test. Sample Null hypothesis The population mean is equal to  o One-sample t-test Test statistic Null distribution.
PSY 307 – Statistics for the Behavioral Sciences
The Normal Distribution. n = 20,290  =  = Population.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Statistics Are Fun! Analysis of Variance
BCOR 1020 Business Statistics
Topic 2: Statistical Concepts and Market Returns
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Final Review Session.
Chapter 2 Simple Comparative Experiments
8-5 Testing a Claim About a Standard Deviation or Variance This section introduces methods for testing a claim made about a population standard deviation.
Chapter 9 Hypothesis Testing.
Chapter 10, sections 1 and 4 Two-sample Hypothesis Testing Test hypotheses for the difference between two independent population means ( standard deviations.
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were.
PSY 307 – Statistics for the Behavioral Sciences
Non-parametric statistics
Chapter 7 Inferences Regarding Population Variances.
Week 9 October Four Mini-Lectures QMM 510 Fall 2014.
PSY 307 – Statistics for the Behavioral Sciences
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
AM Recitation 2/10/11.
F-Test ( ANOVA ) & Two-Way ANOVA
Statistical Analysis I have all this data. Now what does it mean?
Non-parametric Dr Azmi Mohd Tamil.
Overview Definition Hypothesis
Statistical Analysis Statistical Analysis
Education 793 Class Notes T-tests 29 October 2003.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Statistical Analysis I have all this data. Now what does it mean?
What are Nonparametric Statistics? In all of the preceding chapters we have focused on testing and estimating parameters associated with distributions.
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Testing means, part II The paired t-test. Outline of lecture Options in statistics –sometimes there is more than one option One-sample t-test: review.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
© Copyright McGraw-Hill 2000
Kruskal-Wallis H TestThe Kruskal-Wallis H Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized.
Chapter Eight: Using Statistics to Answer Questions.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Data Analysis.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
© Copyright McGraw-Hill 2004
Inferences Concerning Variances
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
Chapter 13. The Chi Square Test ( ) : is a nonparametric test of significance - used with nominal data -it makes no assumptions about the shape of the.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Lesson Test to See if Samples Come From Same Population.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Two-Sample Hypothesis Testing
Assumptions For testing a claim about the mean of a single population
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Chapter 11 Inferences About Population Variances
Chi-square and F Distributions
Elementary Statistics
What are their purposes? What kinds?
Presentation transcript:

Homework Chapter 11: 13 Chapter 12: 1, 2, 14, 16

Assumptions in Statistics

The wrong way to make a comparison of two groups “Group 1 is significantly different from a constant, but Group 2 is not. Therefore Group 1 and Group 2 are different from each other.”

A more extreme case...

Interpreting Confidence Intervals Different Not differentUnknown

Assumptions Random sample(s) Populations are normally distributed Populations have equal variances –2-sample t only

Assumptions Random sample(s) Populations are normally distributed Populations have equal variances –2-sample t only What do we do when these are violated?

Assumptions in Statistics Are any of the assumptions violated? If so, what are we going to do about it?

Detecting deviations from normality Histograms Quantile plots Shapiro-Wilk test

Detecting deviations from normality: by histogram Biomass ratio Frequency

Detecting deviations from normality: by quantile plot Normal data Normal Quantile

Detecting deviations from normality: by quantile plot Biomass ratio Normal Quantile

Detecting deviations from normality: by quantile plot Biomass ratio Normal Quantile Normal distribution = straight line Non-normal = non-straight line

Detecting differences from normality: Shapiro-Wilk test Shapiro-Wilk Test is used to test statistically whether a set of data comes from a normal distribition Ho: The data come from a normal distribution Ha: The data come from some other distribution

What to do when the distribution is not normal If the sample sizes are large, sometimes the standard tests work OK anyway Transformations Non-parametric tests Randomization and resampling

The normal approximation Means of large samples are normally distributed So, the parametric tests on large samples work relatively well, even for non-normal data. Rule of thumb- if n > ~50, the normal approximations may work

Data transformations A data transformation changes each data point by some simple mathematical formula

Log-transformation Y Y' = ln[Y] ln = “natural log”, base e log = “log”, base 10 EITHER WORK

Carry out the test on the transformed data!

Variance and mean increase together --> try the log-transform

Other transformations Arcsine Square-root Square Reciprocal Antilog

Example: Confidence interval with log-transformed data Data: ln data: ln[Y]

Valid transformations... Require the same transformation be applied to each individual Must be backwards convertible to the original value, without ambiguity Have one-to-one correspondence to original values X = ln[Y] Y = e X

Choosing transformations Must transform each individual in the same way You CAN try different transformations until you find one that makes the data fit the assumptions You CANNOT keep trying transformations until P <0.05!!!

Assumptions Random sample(s) Populations are normally distributed Populations have equal variances –2-sample t only Do the populations have equal variances? If so, what should We do about it?

Comparing the variance of two groups One possible method: the F test

The test statistic F Put the larger s 2 on top in the numerator.

F... F has two different degrees of freedom, one for the numerator and one for the denominator. (Both are df = n i -1.) The numerator df is listed first, then the denominator df. The F test is very sensitive to its assumption that both distributions are normal.

Example: Variation in insect genitalia

Degrees of freedom For a 2-tailed test, we compare to F  /2,df1,df2 from Table A3.4

Why  /2 for the critical value? By putting the larger s 2 in the numerator, we are forcing F to be greater than 1. By the null hypothesis there is a 50:50 chance of either s 2 being greater, so we want the higher tail to include just  /2.

Critical value for F

Conclusion The F= from the data is greater than F (0.025), 6,8 =4.7, so we can reject the null hypothesis that the variances of the two groups are equal. The variance in insect genitalia is much greater for polygamous species than monogamous species.

Sample Null hypothesis The two populations have the same variance   1  2 2 F-test Test statistic Null distribution F with n 1 -1, n 2 -1 df compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o

What if we have unequal variances? Welch’s t-test would work If sample sizes are equal and large, then even a ten-fold difference in variance is approximately OK

Comparing means when variances are not equal Welch’s t test compared the means of two normally distributed populations that have unequal variances

Burrowing owls and dung traps

Dung beetles

Experimental design 20 randomly chosen burrowing owl nests Randomly divided into two groups of 10 nests One group was given extra dung; the other not Measured the number of dung beetles on the owls’ diets

Number of beetles caught Dung added: No dung added:

Hypotheses H 0 : Owls catch the same number of dung beetles with or without extra dung (  1 =  2 ) H A : Owls do not catch the same number of dung beetles with or without extra dung (  1   2 )

Welch’s t Round down df to nearest integer

Owls and dung beetles

Degrees of freedom Which we round down to df= 10

Reaching a conclusion t 0.05(2), 10 = 2.23 t=4.01 > 2.23 So we can reject the null hypothesis with P<0.05. Extra dung near burrowing owl nests increases the number of dung beetles eaten.

Assumptions Random sample(s) Populations are normally distributed Populations have equal variances –2-sample t only What if you don’t want to make so many assumptions?

Sample Null hypothesis The two populations have the same mean  1  2 Welch’s t-test Test statistic Null distribution t with df from formula compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o

Non-parametric methods Assume less about the underlying distributions Also called "distribution-free" "Parametric" methods assume a distribution or a parameter

Most non-parametric methods use RANKS Rank each data point in all samples from lowest to highest Lowest data point gets rank 1, next lowest gets rank 2,...

Sign test Non-parametric test Compares data from one sample to a constant Simple: for each data point, record whether individual is above (+) or below (-) the hypothesized constant. Use a binomial test to compare result to 1/2.

Example: Polygamy and the origin of species Is polygamy associated with higher or lower speciation rates?

Etc....

The differences are not normal

Hypotheses H 0 : The median difference in number of species between singly-mating and multiply-mating insect groups is 0. H A : The median difference in number of species between these groups is not 0.

7 out of 25 comparisons are negative P = 2 ( ) = 0.043

The sign test has very low power So it is quite likely to not reject a false null hypothesis.

Non-parametric test to compare 2 groups The Mann-Whitney U test compares the central tendencies of two groups using ranks

Performing a Mann-Whitney U test First, rank all individuals from both groups together in order (for example, smallest to largest) Sum the ranks for all individuals in each group --> R 1 and R 2

Calculating the test statistic, U

Example: Garter snake resistance to newt toxin Rough-skinned newt

Comparing snake resistance to TTX (tetrodotoxin) This variable is known to be not normally distributed within populations.

Hypotheses H 0 : The TTX resistance for snakes from Benton is the same as for snakes from Warrenton. H A : The TTX resistance for snakes from Benton is different from snakes from Warrenton.

Calculating the ranks Rank sum for Warrenton: R 1 = =17

Calculating U 1 and U 2 For a two-tailed test, we pick the larger of U 1 or U 2 : U=U 1 =33

Compare U to the U table Critical value for U for n 1 =5 and n 2 =7 is >30, so we can reject the null hypothesis Snakes from Benton have higher resistance to TTX.

How to deal with ties Determine the ranks that the values would have got if they were slightly different. Average these ranks, and assign that average to each tied individual Count all those individuals when deciding the rank of the next largest individual

Ties

Mann-Whitney: Large sample approximation For n 1 and n 2 both greater than 10, use Compare this Z to the standard normal distribution

Example: U 1 =245 U 2 =80 n 1 =13 n 2 =25 Z 0.05(2) =1.96, Z>1.96, so we could reject the null hypothesis

Assumption of Mann-Whitney U test Both samples are random samples.