Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Exploring the Shape of the Dose-Response Function.
Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
BA 275 Quantitative Business Methods
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Qualitative Variables and
Data mining and statistical learning - lecture 6
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Chapter 11: Inferential methods in Regression and Correlation
1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)
Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner.
Linear statistical models 2009 Models for continuous, binary and binomial responses  Simple linear models regarded as special cases of GLMs  Simple linear.
1 BA 275 Quantitative Business Methods Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction Dummy Variables Agenda.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
OLS versus MLE Example YX Here is the data:
AGEC 622 Mission is prepare you for a job in business Have you ever made a price forecast? How much confidence did you place on your forecast? Was it correct?
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Time series analysis - lecture 1 Time series analysis Analysis of data for which the temporal order of the observations is important Two major objectives:
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Regression Method.
Simple Linear Regression
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
General Linear Models; Generalized Linear Models Hal Whitehead BIOL4062/5062.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
BUSI 6480 Lecture 8 Repeated Measures.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Negative Binomial Regression NASCAR Lead Changes
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Experimental Statistics - week 9
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Stats Methods at IC Lecture 3: Regression.
BA 275 Quantitative Business Methods
CHAPTER 29: Multiple Regression*
Simple Linear Regression
Simple Linear Regression
Multiple Regression Chapter 14.
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Generalized Additive Model
Presentation transcript:

Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in GAM

Datamining and statistical learning - lecture 9 Linear regression models The inputs can be:  quantitative inputs  functions of quantitative inputs  base expansions of quantitative inputs  dummy variables  interaction terms

Datamining and statistical learning - lecture 9 Justification of linear regression models  Many response variables are linearly or almost linearly related to a set of inputs  Linear models are easy to comprehend and to fit to observed data  Linear regression models are particularly useful when: the number of cases is moderate data are sparse the signal-to-noise ratio is low

Datamining and statistical learning - lecture 9 Performance of predictors based on: (i) a simple linear regression model (ii) a quadratic regression model when the true expected response is a second order polynomial in the input Predictions based on a linear modelPredictions based on a quadratic model

Datamining and statistical learning - lecture 9 Logistic regression of multiple purchases vs first amount spent

Datamining and statistical learning - lecture 9 Logistic regression for a binary response variable Y The expectation of Y given x is a linear function of x

Datamining and statistical learning - lecture 9 Generalized additive models: some examples A nonlinear, additive model A mixed linear and nonlinear, additive model A mixed linear and nonlinear, additive model with a class variable

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Observed data Fitted model Output: Total-N conc Inputs: Monthly pattern Trend function

Datamining and statistical learning - lecture 9 Modelling the concentration of total nitrogen at Lobith on the Rhine: Extracted additive components Year components Month components

Datamining and statistical learning - lecture 9 Weekly mortality and confirmed cases of influenza in Sweden Response: Weekly mortality Inputs: Confirmed cases of influenza Seasonal dummies Long-term trend

Datamining and statistical learning - lecture 9 SYNTAX for common GAM models Type of ModelSyntaxMathematical Form Parametricmodel y = param(x); Nonparametricmodel y = spline(x); Nonparametricmodel y = loess(x); Semiparametricmodel y = param(x1) spline(x2); Additivemodel y = spline(x1) spline(x2); Thin-plate splinemodel y = spline2(x1,x2);

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run; Model 2 proc gam data=Mining.Rhine; model Nconc = spline2(Year, Month); output out = addmodel2; run;

Datamining and statistical learning - lecture 9 Proc GAM – degrees of freedom of the spline components The degrees of freedom of the spline components is selected by the user or by specifying method=GCV proc gam data=Mining.Rhine; model Nconc = spline(Year, df=3) spline(Month, df=3); output out = addmodel1; run; Df=3 implies that the same cubic polynomial is valid in the entire range of the input Increasing the df-value implies that knots are introduced

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run;

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 df=4

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 3 df=20

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine The GAM Procedure Dependent Variable: Nconc Smoothing Model Component(s): spline(Year) spline(Month) Summary of Input Data Set Number of Observations 168 Number of Missing Observations 0 Distribution Gaussian Link Function Identity Iteration Summary and Fit Statistics Final Number of Backfitting Iterations 2 Final Backfitting Criterion E-30 The Deviance of the Final Estimate The local score algorithm converged. Model 1

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Regression Model Analysis Parameter Estimates Parameter Standard Parameter Estimate Error t Value Pr > |t| Intercept <.0001 Linear(Year) <.0001 Linear(Month) <.0001 Smoothing Model Analysis Analysis of Deviance Sum of Source DF Squares Chi-Square Pr > ChiSq Spline(Year) Spline(Month) <.0001 Model 1

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2

Datamining and statistical learning - lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 (20 df)

Datamining and statistical learning - lecture 9 Estimation of additive models - the backfitting algorithm

Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden proc gam data=sasuser.smhi; model lnDaily_consumption = spline(Meantemp, df=20); ID Time; output out=smhiouttemp pred resid; run;

Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden: residual analysis

Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies

Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies Splines of contemporaneous and time-lagged weather data Splines of Julian day and time Weekday and holiday dummies

Datamining and statistical learning - lecture 9 Deviance analysis of the investigated models of ln daily electricity consumption in Sweden The residual deviance of a fitted model is minus twice its log-likelihood If the error terms are normally distributed, the deviance is equal to the sum of squared residuals

Datamining and statistical learning - lecture 9 Modelling ln daily electricity consumption in Sweden: time series plot of residuals