Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Slides:



Advertisements
Similar presentations
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Advertisements

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Comparing Means.
Every achievement originates from the seed of determination. 1Random Effect.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Experimental Design Terminology  An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental.
Today Concepts underlying inferential statistics
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
1 Multiple Comparison Procedures Once we reject H 0 :   =   =...  c in favor of H 1 : NOT all  ’s are equal, we don’t yet know the way in which.
When we think only of sincerely helping all others, not ourselves,
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Stats Lunch: Day 7 One-Way ANOVA. Basic Steps of Calculating an ANOVA M = 3 M = 6 M = 10 Remember, there are 2 ways to estimate pop. variance in ANOVA:
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
ANOVA (Analysis of Variance) by Aziza Munir
Everyday is a new beginning in life. Every moment is a time for self vigilance.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
ANOVA: Analysis of Variance.
1 Always be mindful of the kindness and not the faults of others.
1 Analysis of Variance & One Factor Designs Y= DEPENDENT VARIABLE (“yield”) (“response variable”) (“quality indicator”) X = INDEPENDENT VARIABLE (A possibly.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
1 Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)
Chapters Way Analysis of Variance - Completely Randomized Design.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Analyze Of VAriance. Application fields ◦ Comparing means for more than two independent samples = examining relationship between categorical->metric variables.
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Chapter 10 Two-Sample Tests and One-Way ANOVA.
UNIT 4-B: DATA ANALYSIS and REPORTING
Everyday is a new beginning in life.
Statistics for Managers Using Microsoft Excel 3rd Edition
Factorial Experiments
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
The greatest blessing in life is in giving and not taking.
i) Two way ANOVA without replication
Comparing Three or More Means
Hypothesis testing using contrasts
Chapter 10: Analysis of Variance: Comparing More Than Two Means
STAT 6304 Final Project Fall, 2016.
Kin 304 Inferential Statistics
Linear Contrasts and Multiple Comparisons (§ 8.6)
The future is a vain hope, the past is a distracting thought
Single-Factor Studies
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Single-Factor Studies
Always be mindful of the kindness and not the faults of others.
MOHAMMAD NAZMUL HUQ, Assistant Professor, Department of Business Administration. Chapter-16: Analysis of Variance and Covariance Relationship among techniques.
1-Way Analysis of Variance - Completely Randomized Design
I. Statistical Tests: Why do we use them? What do they involve?
One way ANOVA One way Analysis of Variance (ANOVA) is used to test the significance difference of mean of one dependent variable across more than two.
Comparing Means.
Inferential Statistics
Analysis of Variance: repeated measures
Joyful mood is a meritorious deed that cheers up people around you
Psych 231: Research Methods in Psychology
Joyful mood is a meritorious deed that cheers up people around you
1-Way Analysis of Variance - Completely Randomized Design
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
STATISTICS INFORMED DECISIONS USING DATA
Correlation and Simple Linear Regression
Presentation transcript:

Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.

Applied Statistics Using SPSS Topic: One Way ANOVA By Prof Kelly Fan, Cal State Univ, East Bay

Statistical Tools vs. Variable Types Response (output) Predictor (input) Numerical Categorical/Mixed Simple and Multiple Regression Analysis of Variance (ANOVA) Analysis of Covariance (ANCOVA) Categorical Categorical data analysis

Example: Battery Lifetime 8 brands of battery are studied. We would like to find out whether or not the brand of a battery will affect its lifetime. If so, of which brand the batteries can last longer than the other brands. Data collection: For each brand, 3 batteries are tested for their lifetime. What is Y variable? X variable?

Data: Y = LIFETIME (HOURS) BRAND 3 replications per level 1 2 3 4 5 6 7 8 1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0 5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4 1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8 2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4 5.8

Statistical Model • Yij “LEVEL” OF BRAND Yij = i + ij (Brand is, of course, represented as “categorical”) “LEVEL” OF BRAND 1 2 • • •  •  •  • • • C 1 2 • n Y11 Y12 • • • • • • •Y1c Yij = i + ij i = 1, . . . . . , C j = 1, . . . . . , n Y21 • YnI • Yij Ync •   •  •   •    •   •    •    • 

Hypotheses Setup HO: Level of X has no impact on Y HI: Level of X does have impact on Y HO: 1 = 2 = • • • • 8 HI: not all j are EQUAL

ONE WAY ANOVA Analysis of Variance for life Source DF SS MS F P brand 7 69.12 9.87 3.38 0.021 Error 16 46.72 2.92 Total 23 115.84 Estimate of the common variance s^2 S = 1.709 R-Sq = 59.67% R-Sq(adj) = 42.02%

Review Fitted value = Predicted value Residual = Observed value – fitted value

Normality plot: normal scores vs. residuals Diagnosis: Normality The points on the normality plot must more or less follow a line to claim “normal distributed”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. Normality plot: normal scores vs. residuals

From the Battery lifetime data:

Diagnosis: Equal Variances The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. Residual plot: fitted values vs. residuals

From the Battery lifetime data:

Multiple Comparison Procedures Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.

P(at least one type I error in the 3 tests) These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES. Errors (Type I): We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) = 1-P( accept all ) = 1 - (.95)3  .14 3, given true

In other words, Probability is In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23. Question - Should we choose = .05, and suffer (for 5 tests) a .23 OVERALL Error rate (or “a” or aexperimentwise)? OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test  by 1 - (1-)5 = .05, (which gives us  = .011)?

would be valid only if the tests are independent; often they’re not. The formula 1 - (1-)5 = .05 would be valid only if the tests are independent; often they’re not. [ e.g., 1=22= 3, 1= 3 IF accepted & rejected, isn’t it more likely that rejected? ] 1 2 3 1 2 3

When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the overall error rate.

Categories of multiple comparison tests - “Planned”/ “a priori” comparisons (stated in advance, usually a linear combination of the column means equal to zero.) “Post hoc”/ “a posteriori” comparisons (decided after a look at the data - which comparisons “look interesting”) “Post hoc” multiple comparisons (every column mean compared with each other column mean)

There are many multiple comparison procedures. We’ll cover only a few. Post hoc multiple comparisons Pairwise comparisons: Do a series of pairwise tests; Duncan and SNK tests (Optional) Comparisons to control: Dunnett tests

Example: Broker Study A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used. Y=1000(A-P)/A where P=per share price paid for the stock; A=average of high price and low price per share, for the day. “The higher Y is the better the trade is.”

} R=6 Five brokers were in the study and six trades CoL: broker 1 12 3 5 -1 6 2 7 17 13 11 12 3 8 1 7 4 5 4 21 10 15 12 20 6 14 5 24 13 14 18 19 17 } R=6 Five brokers were in the study and six trades were randomly assigned to each broker.

SPSS Output Analyze>>General Linear Model>>Univariate…

Homogeneous Subsets

Conclusion : 3, 1 2, 4, 5 Conclusion : 3, 1 2 4 5 ???

Conclusion : 3, 1 2 4 5 Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers. Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly.

Comparisons to Control Dunnett’s test Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.” Col Example: 1 2 3 4 5 } R=6 6 12 5 14 17 CONTROL

1 2 3 4 5 In our example: 6 12 5 14 17 CONTROL 1 2 3 4 5 In our example: 6 12 5 14 17 - Cols 4 and 5 differ from the control [ 1 ]. - Cols 2 and 3 are not significantly different from control.

Exercise: Sales Data Sales

Exercise. Find the Anova table. Perform SNK tests at a = 5% to group treatments . Perform Duncan tests at a = 5% to group treatments. Which treatment would you use?

Post Hoc and Priori comparisons F test for linear combination of column means (contrast) Scheffe test: To test all linear combinations at once. Very conservative; not to be used for a few of comparisons.

This assumes a “fixed model”: Inherent interest in the specific levels of the factors under study - there’s no direct interest in extrapolating to other levels - inference will be limited to levels that appear in the experiment. Experimenter selects the levels If a “random model”: Levels in experiment randomly selected from a population of such levels, and inference is to be made about the entire population of levels. Then, besides assumptions 1 to 3, there is another assumption: 4) a) the mi are independent random variables which are normally distributed with constant variance b) the mi and eij are independent Random Effect

SPSS: Stat>>General Linear Model, random factors Tests of Between-Subjects Effects Dependent Variable: sales Source Type III SS df Mean Square F Sig. Intercept Hypothesis 3499.200 1 3499.200 21.843 .009 Error 640.800 4 160.200a broker Hypothesis 640.800 4 160.200 7.557 .000 Error 530.000 25 21.200b a. MS(broker) b. MS(Error) Random Effect

KRUSKAL - WALLIS TEST (Lesson 44) (Non - Parametric Alternative) HO: The probability distributions are identical for each level of the factor HI: Not all the distributions are the same 1-Way Anova

BATTERY LIFETIME (hours) Brand A B C 32 32 28 30 32 21 30 26 15 29 26 15 26 22 14 23 20 14 20 19 14 19 16 11 18 14 9 12 14 8 BATTERY LIFETIME (hours) (each column rank ordered, for simplicity) Mean: 23.9 22.1 14.9 (here, irrelevant!!) 1-Way Anova

HO: no difference in distribution. among the three brands with HO: no difference in distribution among the three brands with respect to battery lifetime HI: At least one of the 3 brands differs in distribution from the others with respect to lifetime 1-Way Anova

Ranks Brand A B C 32 (29) 32 (29) 28 (24) 30 (26.5) 32 (29) 21 (18) 32 (29) 32 (29) 28 (24) 30 (26.5) 32 (29) 21 (18) 30 (26.5) 26 (22) 15 (10.5) 29 (25) 26 (22) 15 (10.5) 26 (22) 22 (19) 14 (7) 23 (20) 20 (16.5) 14 (7) 20 (16.5) 19 (14.5) 14 (7) 19 (14.5) 16 (12) 11 (3) 18 (13) 14 (7) 9 (2) 12 (4) 14 (7) 8 (1) T1 = 197 T2 = 178 T3 = 90 n1 = 10 n2 = 10 n3 = 10 1-Way Anova

TEST STATISTIC: 12 •  (Tj2/nj ) - 3 (N + 1) H = N (N + 1) K 12 H = •  (Tj2/nj ) - 3 (N + 1) N (N + 1) j = 1 nj = # data values in column j N = nj K = # Columns (levels) Tj = SUM OF RANKS OF DATA ON COL j When all DATA COMBINED (There is a slight adjustment in the formula as a function of the number of ties in rank.) K j = 1 1-Way Anova

[ [ H = = 8.41 (with adjustment for ties, we get 8.46) 12 197 2 178 2 902 30 (31) 10 10 10 [ + + - 3 (31) = 8.41 (with adjustment for ties, we get 8.46) 1-Way Anova

What do we do with H? We can show that, under HO , H is well approximated by a 2 distribution with df = K - 1. Here, df = 2, and at = .05, the critical value = 5.99 5.99 8.41 = H  = .05 Reject HO; conclude that mean lifetime NOT the same for all 3 BRANDS 1-Way Anova

SPSS: Analyze >> Nonparametric tests >> Independent samples, fields Double click the output table: