More General Need different response curves for each predictor Need more complex responses.

Slides:



Advertisements
Similar presentations
SPM – introduction & orientation introduction to the SPM software and resources introduction to the SPM software and resources.
Advertisements

A Tale of Two GAMs Generalized additive models as a tool for data exploration Mariah Silkey, Actelion Pharmacueticals Ltd. 1.
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
Evidence Contrary to the Statistical View of Boosting David Mease & Abraham Wyner.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Exploratory Analysis of Survey Data Lisa Cannon Luke Peterson.
Best Model Dylan Loudon. Linear Regression Results Erin Alvey.
HEMI 2 Hyper-Envelope Modeling Interface 2 Uses Bezier curves to minimize over- fitting Allows automatic fitting with manual adjustment Provides interactive.
Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.
Model Assessment and Selection
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
Curve Fitting Variations and Neural Data Julie Michelman – Carleton College Jiaqi Li – Lafayette College Micah Pearce – Texas Tech University Advisor:
The loss function, the normal equation,
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner.
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Additive Models and Trees
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Model Checking in the Proportional Hazard model
Fishing Effort: fishery patterns from individual actions Dr. Darren M. Gillis, Biological Sciences, University Of Manitoba, Winnipeg,
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Inference for regression - Simple linear regression
Nonparametric Regression
Simple Linear Regression
Jensen, et. al Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross- validation of a two- stage generalized.
Data Mining Volinsky - Columbia University 1 Chapter 4.2 Regression Topics Credits Hastie, Tibshirani, Friedman Chapter 3 Padhraic Smyth Lecture.
Regression. Population Covariance and Correlation.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Montane Frogs in Rainforest 2013, Marcio et al., Understanding the mechanisms underlying the distribution of microendemic montane frogs (Brachycephalus.
Linear Model. Formal Definition General Linear Model.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Trees Lives Temp>30° Lives Dies Temp
STANDARDIZATION OF CPUE FROM ALEUTIAN ISLANDS GOLDEN KING CRAB FISHERY OBSERVER DATA M.S.M. Siddeek 1, J. Zheng 1, Doug Pengilly 2, and Gretchen Bishop.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Estimation of selectivity in Stock Synthesis: lessons learned from the tuna stock assessment Shigehide Iwata* 1 Toshihde Kitakado* 2 Yukio Takeuchi* 1.
Fitting normal distribution: ML 1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
SOCW 671 #11 Correlation and Regression. Uses of Correlation To study the strength of a relationship To study the direction of a relationship Scattergrams.
Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.
LESSON 4.1. MULTIPLE LINEAR REGRESSION 1 Design and Data Analysis in Psychology II Salvador Chacón Moscoso Susana Sanduvete Chaves.
PREDICT 422: Practical Machine Learning
Robert Plant != Richard Plant
More General Need different response curves for each predictor
How Good is a Model? How much information does AIC give us?
Trees Nodes Is Temp>30? False True Temp<=30° Temp>30°
Bias and Variance of the Estimator
Linear Regression Models
Direct or Remotely sensed
Longline CPUE standardization: IATTC 2006
Presenter: Georgi Nalbantov
Jensen, et. al Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross-validation of a two-stage generalized.
Ch11 Curve Fitting II.
More General Need different response curves for each predictor
Bias-variance Trade-off
2/28/2019 Exercise 1 In the bcmort data set, the four-level factor cohort can be considered the product of two two-level factors, say “period” (
Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power
The BRT was made with over 5,000 trees!
Generalized Linear Models
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

More General Need different response curves for each predictor Need more complex responses

Generalized Additive Models

Spline Curves Knots Bell-shaped Irwin-Hall spline

Spline Curves in R Wrap predictors in a spline function: –s(predictor) Use “gamma” parameter to set the number of knots –Controls over-fitting –1.4 is recommended In R: –TheModel=gam(Height~s(AnnualPrecip), data=TheData,gamma=1.4)

Reading Read Hastie and Tibshirani when you have “time” –“All considered, it is conceivable that in a minor way, nonparametric regression might, like linear regression, become an object treasured for both its artistic merit as well as usefulness” L. Breiman, 1977 Read Martinez-Rincon and Jensen for next time

Which Approach? GAM Kernel Smoother Income Age Hastie and Tibshirani 1986, Generalized Additive Models Z-axis shows the proportion of families with a telephone at home

GAM Plots in R Modeled Response Curve 95% CI Sample point “Grass” FIA Doug-Fir height data vs. BioClim Annual Precipitation “Partial” = 1 Covariate

Brown Shrimp in GOM Data from SeaMap and NOAA

Gamma=1.4 Explained Deviance: 59%, AIC=57807 Data from FIA and BioClim

Gamma=10 Explained Deviance: 59%, AIC=57961 Data from FIA and BioClim

Gamma=20 Explained Deviance: 57%, AIC=58081 Data from FIA and BioClim

Gamma=20 Explained Deviance: 51%, AIC=58796 Data from FIA and BioClim

Gamma=0.1 Explained Deviance: 59%, AIC=57811 Data from FIA and BioClim

GAM Model Runs LayersGammaExplained Deviance AIC All All All Best All

Best Model? Best 3 predictors, gamma=20 Data from FIA and BioClim

Blue Crab Distribution Model

Blue Crab vs. Salinity Jensen et. al. 2005, Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross- validation of a two-stage generalized additive model

Response Curves (partial) GAMs BRTs

GAMs vs. BRTs Martinez-Rincon 2012, Comparative performance of generalized additive models and boosted regression trees for statistical modeling of incidental catch of wahoo (Acanthocybium solandri) in the Mexican tuna purse-seine fishery “Results indicate little difference between the performance of GAM and BRT models”

Gamma in GAMs

Anderson We are not trying to model the data; instead, we are trying to model the information in the data. The goal is to recover the information that applies more generally to the process, not just to the particular data set. If we were merely trying to model the data well, we could fit high order Fourier series terms or polynomial terms until the fit is perfect. Data contain both information and noise; fitting the data perfectly would include modeling the noise and this is counter to our science objective.

Additional Resources Generalized Additive Models: an introduction with R –Copyrighted book –Includes: Linear models GLMs GAMs Examples in R Some matrix algebra

Additional Resources Geospatial Analysis with GAMs: – 1/handouts/C3-Guszcza.pdf Disease mapping using GAMs (workshop): – -mapWorkshop Mapping population based studies: – healthgeographics.com/content/5/1/26