Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.

Slides:



Advertisements
Similar presentations
Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.
Advertisements

A Tale of Two GAMs Generalized additive models as a tool for data exploration Mariah Silkey, Actelion Pharmacueticals Ltd. 1.
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Exploring the Shape of the Dose-Response Function.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Pattern Recognition and Machine Learning
BA 275 Quantitative Business Methods
Polynomial Regression and Transformations STA 671 Summer 2008.
More General Need different response curves for each predictor Need more complex responses.
Model assessment and cross-validation - overview
Taupo, Biometrics 2009 Introduction to Quantile Regression David Baird VSN NZ, 40 McMahon Drive, Christchurch, New Zealand
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in.
Scatterplot Smoothing Using PROC LOESS and Restricted Cubic Splines
Kernel methods - overview
Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner.
EPI809/Spring Models With Two or More Quantitative Variables.
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Additive Models and Trees
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Nonparametric Smoothing Methods and Model Selections T.C. Lin Dept. of Statistics National Taipei University 5/4/2005.
OLS versus MLE Example YX Here is the data:
WLS for Categorical Data
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
EPI809/Spring Testing Individual Coefficients.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Biostatistics-Lecture 14 Generalized Additive Models Ruibin Xi Peking University School of Mathematical Sciences.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Classification and Prediction: Regression Analysis
Spline and Kernel method Gaussian Processes
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
THE SCIENCE OF RISK SM 1 Interaction Detection in GLM – a Case Study Chun Li, PhD ISO Innovative Analytics March 2012.
01/20141 EPI 5344: Survival Analysis in Epidemiology Quick Review and Intro to Smoothing Methods March 4, 2014 Dr. N. Birkett, Department of Epidemiology.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.
Linear Model. Formal Definition General Linear Model.
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 12: Cox Proportional Hazards Model
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Linear Models Alan Lee Sample presentation for STATS 760.
Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters.
Personal Lines Actuarial Research Department Generalized Linear Models CAGNY Wednesday, November 28, 2001 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA.
Generalized Additive Models: An Introduction and Example
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Personal Lines Actuarial Research Department Generalized Linear Models CAS - Boston Monday, November 11, 2002 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
1 Linear Regression Model. 2 Types of Regression Models.
More General Need different response curves for each predictor
Non-linear relationships
Generalized Linear Models
Correlation, Regression & Nested Models
Machine learning, pattern recognition and statistical data modelling
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
More General Need different response curves for each predictor
Generalized Linear Models
Generalized Additive Model
Presentation transcript:

Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005

GLM’s – The Challenge What to do with continuous variables? –Eg. Age, credit score, amount of insurance Options –Categorize – but how? Equal volume, Tree, judgment –Appendix H, “A Practioner’s Guide to GLMs” by Duncan et al –Treat as polynomial The Weierstrass Approximation Theorem Eg Mileage (2 miles)^4 = 16 (25 miles)^4 = 390,625 –Look at categorical estimates, transform, rerun Newage variable = age^3 if age < 20 + age^2 if age < 80 + minimum (age, 80) All forms must be decided BEFORE model is run Obviously, no clear winner!

Modelers Aspiration

Generalized Additive Models - GAMS GLMs are special case of GAMs Eg LN(E[PP]) = Intercept + f1(age) + f2(gender) + f3(symbol) + f4(marital) The functions f1,f2,f3,f4 can be anything –GLM - Categorical, polynomial, transforms –Non-parametric functional smoothers –Decision trees Balance degrees of freedom, amount of data, and functional form better

Smoothers – Partial List Locally weighted running line smoother (LOESS) Regression splines Cubic smoothing splines Monotonic splines B-splines Kernel smoothers Running medians, means, lines GLM – categories or polynomials Decision Trees Many can be extended to multiple dimensions

GAM – Keys Backfitting allows reduction of dimension –Residual Z = LN(E[PP]) – intercept – f1(age) – f2(gender) – f4(marital) –Fit Z = f3(symbol) –Now a 2-dimensional problem “Y vs X” Data drives the shape –Not determined apriori –Use of cross validation to find smoothing parameter “Local” – many of the smoothers use only data points close to the point being predicted, instead of all.

Example – SAS Code proc gam data=all; class gender marital2; model clclmonz = param(gender marital2) spline(age2,df=4) spline(symbol,df=3) / dist=Poisson; output out=estall p; run;

Example – Degrees of Freedom

Smoothing Spline Error Criteria ∑ {Y i – g(t i ) } ² + λ ∫ { g” (t)} ² dt –λ is smoothing parameter –Reference: Nonparametric Regression and Generalized Linear Models, Green and Silverman

Example – Cross Validation proc gam data=all; class gender marital2; model clclmonz = param(gender marital2) spline(age2) spline(symbol) / method=GCV dist=Poisson; output out=estGCV p; run; Results in degrees of freedom of 17 and 14.

Miscellaneous Parameter Estimates – 1 for each value SPLUS References –SAS Proc Gam –Generalized Additive Models, Hastie and Tibshirani

Q & A Keith D. Holler PhD, FCAS, ASA, ARM Personal Lines Research Department St. Paul Travelers k d (860) 277 – 4808 Research paper in progress for Ratemaking call