Assumption and Data Transformation. Assumption of Anova The error terms are randomly, independently, and normally distributed The error terms are randomly,

Slides:



Advertisements
Similar presentations
A Spreadsheet for Analysis of Straightforward Controlled Trials
Advertisements

Comparing Two Proportions
Assumptions underlying regression analysis
Multiple Analysis of Variance – MANOVA
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Moderation: Assumptions
C82MST Statistical Methods 2 - Lecture 4 1 Overview of Lecture Last Week Per comparison and familywise error Post hoc comparisons Testing the assumptions.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Threats to Conclusion Validity. Low statistical power Low statistical power Violated assumptions of statistical tests Violated assumptions of statistical.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Chapter Eighteen MEASURES OF ASSOCIATION
Final Review Session.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
MARE 250 Dr. Jason Turner Hypothesis Testing III.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Statistics 350 Lecture 10. Today Last Day: Start Chapter 3 Today: Section 3.8 Homework #3: Chapter 2 Problems (page 89-99): 13, 16,55, 56 Due: February.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Linear Regression Example Data
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 13 Using Inferential Statistics.
Business Statistics - QBM117 Statistical inference for regression.
Assumptions of the ANOVA The error terms are randomly, independently, and normally distributed, with a mean of zero and a common variance. –There should.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
Assumptions of the ANOVA
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Simple Linear Regression Analysis
Inference for regression - Simple linear regression
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.
Special Topics 504: Practical Methods in Analyzing Animal Science Experiments The course is: Designed to help familiarize you with the most common methods.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Chapter 16 The Chi-Square Statistic
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
AOV Assumption Checking and Transformations (§ )
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
1 G Lect 11a G Lecture 11a Example: Comparing variances ANOVA table ANOVA linear model ANOVA assumptions Data transformations Effect sizes.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
1 The Two-Factor Mixed Model Two factors, factorial experiment, factor A fixed, factor B random (Section 13-3, pg. 495) The model parameters are NID random.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Chapter Eight: Using Statistics to Answer Questions.
Residual Analysis for ANOVA Models KNNL – Chapter 18.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
Correlation & Regression Analysis
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Analysis of Variance STAT E-150 Statistical Methods.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Stats Methods at IC Lecture 3: Regression.
Transforming the data Modified from:
Inference for Least Squares Lines
ANALYSIS OF VARIANCE (ANOVA)
Essentials of Modern Business Statistics (7e)
Single-Factor Studies
Single-Factor Studies
Joanna Romaniuk Quanticate, Warsaw, Poland
Chapter 4, Regression Diagnostics Detection of Model Violation
Principal Component Analysis
Product moment correlation
Presentation transcript:

Assumption and Data Transformation

Assumption of Anova The error terms are randomly, independently, and normally distributed The error terms are randomly, independently, and normally distributed The variance of different samples are homogeneous The variance of different samples are homogeneous Variances and means of different samples are not correlated Variances and means of different samples are not correlated The main effects are additive The main effects are additive

Randomly, independently and Normally distribution The assumption of normality do not affect the validity of the analysis of variance too seriously The assumption of normality do not affect the validity of the analysis of variance too seriously There are test for normality, but it is rather point pointless to apply them unless the number of samples we are dealing with is fairly large There are test for normality, but it is rather point pointless to apply them unless the number of samples we are dealing with is fairly large Independence implies that there is no relation between the size of the error terms and the experimental grouping to which the belong Independence implies that there is no relation between the size of the error terms and the experimental grouping to which the belong It is important to avoid having all plots receiving a given treatment occupying adjacent positions in the field It is important to avoid having all plots receiving a given treatment occupying adjacent positions in the field The best insurance against seriously violating the first assumption of the analysis of variance is to carry out the randomization appropriate to the particular design The best insurance against seriously violating the first assumption of the analysis of variance is to carry out the randomization appropriate to the particular design

Normally test Shapiro-Wilk test Shapiro-Wilk test Lilliefors-Kolmogorov-Smirnov Test Lilliefors-Kolmogorov-Smirnov Test Graphical methods based on residual error (Residual Plotts) Graphical methods based on residual error (Residual Plotts)

Homogeneity of Variance Unequal variances can have a marked effect on the level of the test, especially if smaller sample sizes are associated with groups having larger variances Unequal variances can have a marked effect on the level of the test, especially if smaller sample sizes are associated with groups having larger variances Unequal variances will lead to bias conclusion Unequal variances will lead to bias conclusion

Way to solve the problem of Heterogeneous variances We can separate the data into groups such that the variances within each group are homogenous We can separate the data into groups such that the variances within each group are homogenous We can use an advance statistic tests rather than analysis of variance We can use an advance statistic tests rather than analysis of variance we might be able to transform the data in such a way that they will be homogenous we might be able to transform the data in such a way that they will be homogenous

Homogeneity test of Variance Hartley F-max test Hartley F-max test Bartlett’s test Bartlett’s test Residual plot for checking the equal variance assumption Residual plot for checking the equal variance assumption

Independence of Means and Variance It is a special case and the most common cause of heterogeneity of variance It is a special case and the most common cause of heterogeneity of variance A positive correlation between means and variances is often encountered when there is a wide range of sample means A positive correlation between means and variances is often encountered when there is a wide range of sample means Data that often show a relation between variances and means are data based on counts and data consisting of proportion or percentages Data that often show a relation between variances and means are data based on counts and data consisting of proportion or percentages Transformation data can frequently solve the problems Transformation data can frequently solve the problems

The Main effects are additive For each design, there is a mathematical model called a linear additive model. For each design, there is a mathematical model called a linear additive model. It means that the value of experimental unit is made up of general mean plus main effects plus an error term It means that the value of experimental unit is made up of general mean plus main effects plus an error term When the effects are not additive, there are multiplicative treatment effect When the effects are not additive, there are multiplicative treatment effect In the case of multiplication treatment effects, there are again transformation that will change the data to fit the additive model In the case of multiplication treatment effects, there are again transformation that will change the data to fit the additive model

Data Transformation There are two ways in which the anova assumptions can be violated: There are two ways in which the anova assumptions can be violated: 1. Data may consist of measurement on an ordinal or a nominal scale 2. Data may not satisfy at least one of the four requirements Two options are available to analyze data: Two options are available to analyze data: 1. It is recommended to use non-parametric data analysis 2. It is recommended to transform the data before analysis

Logaritmic Transformation It is used when the standard deviation of samples are roughly proportional to the means It is used when the standard deviation of samples are roughly proportional to the means There is an evidence of multiplicative rather than additive There is an evidence of multiplicative rather than additive Data with negative values or zero can not be transformed. It is suggested to add 1 before transformation Data with negative values or zero can not be transformed. It is suggested to add 1 before transformation

Square Root Transformation It is used when we are dealing with counts of rare events It is used when we are dealing with counts of rare events The data tend to follow a Poisson distribution The data tend to follow a Poisson distribution If there is account less than 10. It is better to add 0.5 to the value If there is account less than 10. It is better to add 0.5 to the value

Arcus sinus or angular Transformation It is used when we are dealing with counts expressed as percentages or proportion of the total sample It is used when we are dealing with counts expressed as percentages or proportion of the total sample Such data generally have a binomial distribution Such data generally have a binomial distribution Such data normally show typical characteristics in which the variances are related to the means Such data normally show typical characteristics in which the variances are related to the means