Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.

Slides:



Advertisements
Similar presentations
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Advertisements

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 10 Curve Fitting and Regression Analysis
Ch11 Curve Fitting Dr. Deshi Ye
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in.
Chapter 13 Multiple Regression
x – independent variable (input)
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Additive Models and Trees
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
Chapter 9 Additive Models,Trees,and Related Models
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Collaborative Filtering Matrix Factorization Approach
Neural Networks Lecture 8: Two simple learning algorithms
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Chapter 13 Multiple Regression
Linear Models for Classification
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CPH Dr. Charnigo Chap. 9 Notes To begin with, have a look at Figure 9.5 on page 315. One can get an intuitive feel for how a tree works by examining.
Chapter 15 Multiple Regression Model Building
Chapter 7. Classification and Prediction
Linear Regression.
Basic Estimation Techniques
Correlation and Regression
CSE 4705 Artificial Intelligence
Additive Models,Trees,and Related Models
Boosting and Additive Trees
Additive Models,Trees,and Related Models
Machine learning, pattern recognition and statistical data modelling
John Loucks St. Edward’s University . SLIDES . BY.
Business Statistics Multiple Regression This lecture flows well with
Basic Estimation Techniques
Linear Regression.
Collaborative Filtering Matrix Factorization Approach
Linear regression Fitting a straight line to observations.
6.1 Introduction to Chi-Square Space
Regression and Categorical Predictors
Presentation transcript:

Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006

Summary Generalized Additive Models Tree-Based Methods PRIM – Bump Hunting Mutlivariate Adaptive Regression Splines (MARS) Missing Data

Additive Models In real life, effects are nonlinear Note: Some slides are borrowed from Tibshirani

Examples

The Price for Additivity Data from a study of Diabetic children, Predicting log C-peptide (a blood measurement)

Generalized Additive Models (GAM) Two-class Logistic Regression

Other Examples

Fitting Additive Models Given observations xi,yi, a criterion like the penalized sum of squares can be specified for this problem, where ’s are tuning parameters The mean of error term is zero!

Fitting Additive Models

The Backfitting Algorithm for Additive Models Initialize: Cycle: j=1,2,…,p,1,2,…,p,1,… Until the functions f j change less than a prespecified threshold

Fitting Additive Models (Cont’d)

Example: Penalized Least square

Example: Fitting GAM for Logistic Regression (Newton-Raphson Algorithm)

Example: Predicting Spam Data from 4601 mail messages, spam=1, =0, filter trained for each user separately Goal: predict whether an is spam (junk mail) or good Input features: relative frequencies in a message of 57 of the commonly occurring words and punctuation marks in all training set Not all errors are equal; we want to avoid filtering out good , while letting spam get through is not desirable but less serious in its consequences

Predictors

Details

Some Important Features

Results Test data confusion matrix for the additive logistic regression model fit to the spam training data The overall test error rate is 5.3%

Summary of Additive Logistic Fit Significant predictors from the additive model fit to the spam training data. The coefficients represent the linear part of f ^ j, along with their standard errors and Z-score. The nonlinear p-value represents a test of nonlinearity of f ^ j

Example: Plots for Spam Analysis Figure 9.1. Spam analysis: estimated functions for significant predictors. The rug plot along the bottom of each frame indicates the observed values of the corresponding predictor. For many predictors, the nonlinearity picks up the discontinuity at zero.

In Summary Additive models are a useful extension to linear models, making them more flexible The backfitting procedure is simple and modular Limitations for large data mining applications Backfitting fits all predictors, which is not desirable when a large number are available

Tree-Based Methods

Node Impurity Measures

Results for Spam Example

Pruned tree for the Spam Example

Classification Rules Fit to the Spam Data

PRIM-Bump Hunting

Number of Observations in a Box

Basis Functions

MARS Forward Modeling Procedure

Multiplication of Basis Functions

MARS on Spam Example