Moving away from Linear-Gaussian assumptions Cons: Some things become much harder. No baked-in test of global fit Non-recursive models Error correlations.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Correlation and regression
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Forecasting Using the Simple Linear Regression Model and Correlation
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Correlation and Regression
EPI 809/Spring Probability Distribution of Random Error.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Variance and covariance M contains the mean Sums of squares General additive models.
Statistics for Managers Using Microsoft® Excel 5th Edition

Chapter 13 Introduction to Linear Regression and Correlation Analysis
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
EPI809/Spring Testing Individual Coefficients.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 7 Forecasting with Simple Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Generalized Linear Models
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Correlation & Regression
Review of Lecture Two Linear Regression Normal Equation
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Correlation and Linear Regression
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Transforming the data Modified from:
Chapter 15 Multiple Regression Model Building
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Simple Linear Regression
Regression.
AP Statistics Chapter 14 Section 1.
This Week Review of estimation and hypothesis testing
Review for Exam 2 Some important themes from Chapters 6-9
DCAL Stats Workshop Bodo Winter.
Correlation and Regression-III
Regression Chapter 8.
Regression.
Generalized Linear Models
Presentation transcript:

Moving away from Linear-Gaussian assumptions Cons: Some things become much harder. No baked-in test of global fit Non-recursive models Error correlations and Latent variables harder to deal with How do we label an arrow? Pros: Flexibility to model nodes with whatever statistical assumption we want to make. Better inference Better predictions

Causal Effects in Non-linear models: How big is the effect? firesev age 

The Logic of Graphs: Conditional Independences, Missing link & Testable implications How do we test structure of the model without Var-Cov matrix? x y1 y2 y3 For directed, acyclic models where all nodes are observed, V i ⏊ Non-Child(V j )|Pa(V i,V j ) The residuals of each pair of nodes not connected by a link should be independent. Each missing link represents a local test of the model structure Individual test results can be combined using Fisher’s C to give a global test of structure.

The Logic of Graphs: Conditional Independences, Missing link & Testable implications How do we test structure of the model without Var-Cov matrix? x y1 y2 y3 How many implied CI there? N(N-1)/2-L Where N= number of nodes L=number of links

Strategy for local estimation analysis 1.Create a causal graph 2.Model all nodes as functions of variables given by graph (using model selection of pick functional form) 3.Evaluate all conditional independences implied by graph using model residuals 4.If conditional independence test fails modify graph and goto 2

Generalized Linear Models – 3 components A probability distribution from the exponential family Normal, Log-Normal, Gamma, beta, binomial, Poisson, geometric A Linear predictor A Link function g such that Identity, Log, Logit, Inverse

7 California wildfires example age firesevcover distance abio hetero rich

8 California wildfires example age firesevcover distance abio hetero rich

Causal Assumptions: dist age age firesev firesev cover cover rich dist rich Implied Conditional Independences: firesev ⏊ dist | (age) cover ⏊ dist | (firesev) cover ⏊ age | (firesev) rich ⏊ age | (cover,dist) rich ⏊ firesev | (cover,dist) A. Submodel – it’s causal assumptions and testable implications.

A. Functional Specification I – Models of Uncertainty Variable Potential values Prob. Dist. age{0,1,2,3,…} Negative Binom rich{0,1,2,3,…} Negative Binom firesev (0, ∞) Gamma cover (0, ∞) Gamma

A. Functional Specification II – Models for Expected Values

B. Modeling the Nodes - Age age dist  >library(MASS) >a1.lin<-glm.nb(age~distance,data=dat) >a1.q<-glm.nb(age~distance+I(distance^2),…) > AICtab(a1.lin,a1.q,weights=T) dAIC df weight a1.q a1.lin >curve(exp(p.l[1]+p.1[2]*x),from=0,to=100,add=T) >curve(exp(p.q[1]+p.q[2]*x+p.q[3]*x^2),from=0,to=100,add=T,lty=2)

firesev age  >f.lin<-glm(firesev~age,family=Gamma(link="log"),…) B. Modeling the Nodes - Firesev >curve(exp(p.f.lin[1]+p.f.lin[2]*x),from=0,to=100,add=T)

Aside- Linearization of a saturating function

firesev age  >f.sat<-glm(firesev~I(1/age),family=Gamma(link="inverse"),…) >curve(1/p.f.sat[2]*x/(1+1/p.f.sat[2]*p.f.sat[1]*x),from=0, to=65,add=T,lty=2) B. Modeling the Nodes - Firesev

firesev age  B. Modeling the Nodes - Firesev > AICtab(f.lin,f.sat,weights=T) dAIC df weight f.sat f.lin <0.001

B. Modeling the Nodes - Cover cover firesev  >c.lin<-glm(cover~firesev,family=Gamma(link=log),…) >curve(exp(p.c[1]+p.c[2]*x),from=0,to=9,add=T,lwd=2)

B. Modeling the Nodes - Richness cover firesev  dist >r.lin<-glm.nb(rich~distance+cover,data=dat) >r.q<-glm.nb(rich~distance+I(distance^2)+cover,…) > AICtab(r.lin,r.q,weights=T) dAIC df weight r.q r.lin

C. Testing the conditional independences Implied Conditional Independences: firesev ⏊ dist | (age) cover ⏊ dist | (firesev) cover ⏊ age | (firesev) rich ⏊ age | (cover,dist) rich ⏊ firesev | (cover,dist) Method for testing conditional indepedences: For each implied conditional independence statement: 1. Hypothesize that a link between the variables exists 2.Quantify the evidence that the link explains residual variation in the variable chosen as the response.

C. Testing the conditional independences

What we need: 1.List of all implied conditional independences 2.Residuals for all fitted nodes >source(‘glmsem.r') >fits=c("a1.q","f.sat","c.lin","r.q") >stuff<-get.stuff.glm(fits,dat) get.stuff.glm returns: 1.R^2 for each node ($R.sq) 2.Estimated Causal Effect*(over obs. range) ($est.causal.effects) 3.Graph implied condition independences ($miss.links) 4.Predicted values for each node ($predictions) 5.Residuals for each node ($residuals) 6.Matrix of links in the graph ($links) 7.Matrix of prediction equations ($pred.eqns)

C. Testing the conditional independences >nl.detect3(dat,stuff$residuals,stuff$miss.links) $p.vals distance-firesev distance-cover age-cover age-rich firesev-rich $fisher.c [1] $d.f [1] 10 $fisher.c.p.val [1]

D. Check Model - Residuals >pairs(stuff$residuals)

D. Check Model- Parameter Estimates >sapply(fits,function(x)summary(get(x))$coefficients) $a1.q Estimate Std. Error z value Pr(>|z|) (Intercept) e distance e I(distance^2) e $f.sat Estimate Std. Error t value Pr(>|t|) (Intercept) e-19 I(1/age) e-07 $c.lin Estimate Std. Error t value Pr(>|t|) (Intercept) e-01 firesev e-05 $r.q Estimate Std. Error z value Pr(>|z|) (Intercept) e e+00 distance e e-07 I(distance^2) e e-05 cover e e-03

D. Check Model- Print Resulting Graph #requires graphviz and {PNG} >glmsem.graph(stuff)

E. Run a Query (intervention) new.dat<-dat new.dat[,'age']<-2 dat.int<-calc.intervention.glm(fits,stuff$links,"age",new.dat)

Discussion Get glmsem.r and these slides and R code for exmpl at: