Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.

Slides:



Advertisements
Similar presentations
Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.
Advertisements

To go further: intra- versus interindividual variability.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Linear Regression t-Tests Cardiovascular fitness among skiers.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Objectives (BPS chapter 24)
Correlation and Linear Regression
Conditions of application Assumption checking. Assumptions for mixed models and RM ANOVA Linearity  The outcome has a linear relationship with all of.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Regression Diagnostics Checking Assumptions and Data.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Regression and Correlation Methods Judy Zhong Ph.D.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Simple Linear Regression Models
Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Experimental Design and Statistics. Scientific Method
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Data Analysis.
Outliers and influential data points. No outliers?
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Example x y We wish to check for a non zero correlation.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Lecture Slides Elementary Statistics Twelfth Edition
Step 1: Specify a null hypothesis
CHAPTER 3 Describing Relationships
Statistical Data Analysis - Lecture /04/03
AP Statistics Chapter 14 Section 1.
Model validation and prediction
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Statistics in MSmcDESPOT
Hypothesis Testing: Hypotheses
Introduction to Inferential Statistics
CHAPTER 29: Multiple Regression*
Least-Squares Regression
CHAPTER 3 Describing Relationships
Outliers and Influence Points
Checking the data and assumptions before the final analysis.
Linear Regression and Correlation
Presentation transcript:

Conditions of applications

Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals vs. fitted – QQ plots – Cook’s distance

Conditions of applications RM ANOVA and multilevel modeling have 2 conditions of application in common: – Normality of the DV by cell of the IV Few outliers – Homoscedasticity (equality of variance) – (Linearity: trivial in ANOVA since we only estimate mean differences)

Problems with checking normality by cell Number of cells grow with number of IV What about continuous IV How to deal with number of tests

Problems with checking homoscedasticity by pair of cells Number of cells grow with number of IV What about continuous IV How to deal with number of tests

Residuals: definition Yi = b0 + b1X +  Thus, Where  are the residuals, and correspond to the distance between the observed value and the best predicted value

Residuals: what to look for Residuals should have a normal distribution across (or irrespective of) groups since differences in IV have been subtracted. Residuals should have equal variances, similarly to observed DV by cell There should be no remaining structure in the residuals (allow to check for linearity

Normality tests Many normality tests exist. By order of type I and type II error: – Shapiro-Wilk: – Where a depends on the parameters of a normal distribution and x i are the value of x from the smallest to the largest – Anderson-Darling: same idea of ordering data – Kolmogorov-smirnov – …

But… All of these tests are known to be incorrect. – When data are in fact from a normal distribution, they reject the null too often or too rarely – When data are in fact not from a normal distribution, they do not reject the null often enough (low power)

Residual plots: residuals vs. fitted or vs. each IV Scatterplot of the predicted values (Yi hat) against the residuals or against each IV. There are different versions of this type of plot (e.g., residuals can be divided by their estimated standard deviation or not) They allow to examine – homoscedasticity, – Linearity of relationship between IV and DV, – Normality of residuals (should have ellipsoid shape), – outliers

Residual plot: Quantile-Quantile plot Graphical method for comparing two probability distributions Compare the quantiles of the normal distribution with mean 0 and variance s 2 to the values (ordered) of the residuals All the points should align on the diagonal from bottom left to top right

Outliers Outliers are extreme values either on the IV or on the DV or both. Leverage observations are extreme on the X- axis (IV). But may not influence too much the estimation of the parameters. Influential observations are extreme on the X and Y axes, and influence greatly the estimation of the parameters

Cook’s distances Where Yj are the predicted values of Y, and Yj(i) are the predicted values of Y if observation i was removed and the model was estimated again. p is the number of parameters of the model and MSE is the mean square error. Cutoff: 1 or 4/n or F p,n-p

An example of a residual analysis Back to autism data again. – Step 1: obtain the residuals use the option save in the mixed linear model – Step 2: check normality (analysis  explore) – Step 3: look at residuals plot Residuals vs fitted Residuals vs time (Standardized residuals vs fitted) QQ plots (Cook’s distance)

Normality tests