Independent variables correlate with each other

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

General Linear Model Introduction to ANOVA.
Factorial ANOVA More than one categorical explanatory variable.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Introduction to Probability and Statistics Linear Regression and Correlation.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Correlation and Regression
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Regression. Population Covariance and Correlation.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Chapter 6 Simple Regression Introduction Fundamental questions – Is there a relationship between two random variables and how strong is it? – Can.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Chapter 10: Determining How Costs Behave 1 Horngren 13e.
Remember You just invented a “magic math pill” that will increase test scores. On the day of the first test you give the pill to 4 subjects. When these.
1 G Lect 3M Regression line review Estimating regression coefficients from moments Marginal variance Two predictors: Example 1 Multiple regression.
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Stats Methods at IC Lecture 3: Regression.
Inference about the slope parameter and correlation
Chapter 14 Introduction to Multiple Regression
Regression and Correlation
Regression Analysis AGEC 784.
REGRESSION G&W p
Why is this important? Requirement Understand research articles
LECTURE 13 Thursday, 8th October
B&A ; and REGRESSION - ANCOVA B&A ; and
i) Two way ANOVA without replication
Comparing Three or More Means
Interactions and Factorial ANOVA
PCB 3043L - General Ecology Data Analysis.
Correlation – Regression
Multiple Regression Analysis and Model Building
SIMPLE LINEAR REGRESSION MODEL
12 Inferential Analysis.
Chapter 1 – Ecological Data
Simple Linear Regression - Introduction
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Random and Mixed Effects ANOVA
CHAPTER 29: Multiple Regression*
Inference about the Slope and Intercept
Comparing Several Means: ANOVA
Simple Linear Regression
Prepared by Lee Revere and John Large
Simple Linear Regression
Regression Models - Introduction
Statistics review Basic concepts: Variability measures Distributions
EQ: How well does the line fit the data?
Inference about the Slope and Intercept
LEARNING OUTCOMES After studying this chapter, you should be able to
Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond.
Two-way analysis on variance
12 Inferential Analysis.
SIMPLE LINEAR REGRESSION
Product moment correlation
Some statistics questions answered:
Experimental design.
Ch 4.1 & 4.2 Two dimensions concept
Regression Part II.
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression Models - Introduction
Presentation transcript:

Independent variables correlate with each other - a possibility for mediated correlation, direct and mediated effect; - shoe size and food consumption – we know; - body size and stomach volume – we do not know; - two continuous variables; - two categorical – unbalanced ANOVA:     black white males 20 observ. 2 obs. females 3 obs. 24 obs.  

Statistics cannot answer the question which variable has a direct effect and which has an indirect (mediated) effect, and to which extent but we can divide the variance into components. 1) the part which can certainly be ascribed to one variable; 2) the part which can certainly be ascribed to the other variable; 3) the part which we do not know how to divide; the last part is the larger the stronger is the correlation.

Height of the plants as dependent on 1) temperature; 2) humidity Let’s asume that temperature and humidity correlate with each other, ... we are on the southern margin of a desert: temperature humidity

temperature plant height plant height humidity Studying one by one there is no doubt that plant height depends on both humidity and temperature, but does humidity have such an effect which cannot be explained by its correlation with temperatuure? (is direct, not mediated)? but does temperatuure have such an effect? .... we include them both to an ANOVA model as independent variables, but there are several ways... dividing SS

Type I analysis or type I sums of squares all the grey area is assigned to this variable which appears first in the model, estimates the maximal effect; conservative with respect to the second variable – estimates what is “certainly its own effect“, estimates the minimal effect. Humidity as the first variable: DF Type I SS F P   humidity 1 2164 32.30 0.0023 temperature 1 142 2.13 0.2045 .... we cannot claim that temperature has a direct effect.

Temperature as the first variable: Source DF Type I SS F p   temperature 1 1886 28.1 0.0032 humidity 1 420 6.28 0.0541 We cannot claim a direct (not mediated by temperature) effect of humidity. And now type III – conservative with respect to both! DF Type III SS F P   temperatuur 1 142.5 2.13 0.2045 niiskus 1 420.8 6.28 0.0541

In Type I, the order is important, in type III is not! The danger with type III – joint explanatory power may remain undetected! Do it for yourself in diferent ways, present type III if it reflects the reality! Otherwise you must explain. If the variables are not correlated (ANOVA is balanced) then there is no difference! and also for a one-way analysis. Type II and IV also exist. Avoid when possible, but always you cannot!

Covariates in an experiment- direct and indirect effect Effect of crowding on moth fecundity, via body weight, or is there something else? Taking weight as a covariate! Including the covariate changes the interpretation! manipulation: rearing in groups pupal weight fecundity of the moths

Multiple regression height = 0,597*temp + 0,089*light + 0,196*humidity - 0.12 Non-linear regression y = sin(ax + c(log(x))bx - which function to choose? - know from theory? - what can we conclude? - properties of the function, not supported by the data. - OK to describe. Usually enough to study if it deviates from linearity, - including the squared term: positive or negative; - fitting a parabola.

Type II regression: when we want to conclude something from the value of the slope; type I regression for predicting, not to estimate the „real“ relationship, to evaluate the relationship; - different equations for different directions of the prediction; Equation does not depend on switching the axes! Geometric mean regression - geometric mean of slopes both ways; - does not matter if only the existence of a relationship is of interest!

Variations of ANOVA: - hierarchical (nested); - random factors; - repeated measures;

Nested (hierarchical) ANOVA the effect of a factor is „allowed“ to differ at different levels of another factor; one factor is nested within another: in brackets B(A): Does tail length depend on sex? An usual ANOVA DF Type III SS F P   sex 1 0.333 0.05 0.82 species 1 0.333 0.05 0.82 but now sex nested within species:   sex 1 0.333 0.16 .69 sex(species) 2 40.66 9.76 .0071 ..... classes in schools: class(school); .... subpopulations, experimental design, covariates.

ANOVA with random factors a random factor is such an independent variable, the levels of which can be seen as a sample from a large population of levels. - brood For fixed factors, all levels are represended in our sample: - treatment; - sex. We can also say that, for random factors, the error variance is at two levels: - the variation of individual observations around brood means; - the variation of brood means around the grand mean.

We study the dependence of the size of oak leaf on temperatuure. Which are random factors: - tree individual; - branch in a tree; - season (summer etc.); - grove (forest fragment); - habitat (forest/ open land); - year. Can depend on question asked: do we study the difference between these populations or do we want to generalize the results to all populations (of Estonia?) When is random, can be generalized!

Fixed effects ANOVA, random effects ANOVA, mixed ANOVA. no difference for one-way ANOVA. A two-way ANOVA (one factor mixed, one random), difference is modest when there are no interactions; with interactions the difference can be large! - 6 broods, 2:4: how in the whole population? Do not know. the difference is mostly in the effect of the fixed factor!

brood fixed: manipulation brood effect Type 3 Tests of Fixed Effects   Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F trea 1 24 6.10 0.0211 brood 5 24 23.70 <.0001 trea*brood 5 24 11.12 <.0001

brood random: manipulation brood effect Type 3 Tests Num Den   Type 3 Tests   Num Den Effect DF DF F Value Pr > F trea 1 24 0.55 0.4924 brood 5 24 2.13 0.213 trea*brood 5 24 11.12 <.0001

brood fixed: manipulation brood effect Type 3 Tests of Fixed Effects   Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F trea 1 24 7.76 0.0103 brood 5 24 12.31 <.0001 trea*brood 5 24 0.39 0.8486

brood random: manipulation brood effect Type 3 Tests Num Den   Type 3 Tests Num Den Effect DF DF F Value Pr > F trea 1 24 19.74 0.0067 brood 5 24 31.32 0.0009 trea*brood 5 24 0.39 0.8486

not always there is a right and a wrong way to analyze; should be a number of levels; - should group the observations, not the observation itself!

weight population

Repeated measures ANOVA - one individual (or something else) has been measured several times; Should not be treated in an usual way : - overestimate the number of df – pseudoreplications!; - do not take the individuality into account. REPEATED measurements ANOVA will help! 1. dependent samples t-test; 2. dependence of the weight of lice on bear fur thickness; 3. birds are fed with different food, parasitism index are counted, all birds measured four times, time*trea interaction is of interest.