Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.

Slides:



Advertisements
Similar presentations
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Advertisements

Linear Regression.
Prof. Navneet Goyal CS & IS BITS, Pilani
Brief introduction on Logistic Regression
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Chapter 8 – Logistic Regression
Supervised Learning Recap
Chapter 4: Linear Models for Classification
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
x – independent variable (input)
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Additive Models and Trees
Linear Methods for Classification
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
Chapter 9 Additive Models,Trees,and Related Models
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Review of Lecture Two Linear Regression Normal Equation
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Outline Separating Hyperplanes – Separable Case
Model Inference and Averaging
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Logistic Regression Database Marketing Instructor: N. Kumar.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
By Sharath Kumar Aitha. Instructor: Dr. Dongchul Kim.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Linear Models for Classification
Support Vector Machines in Marketing Georgi Nalbantov MICC, Maastricht University.
Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.
Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
BINARY LOGISTIC REGRESSION
Deep Feedforward Networks
Notes on Logistic Regression
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Boosting and Additive Trees (2)
Additive Models,Trees,and Related Models
Boosting and Additive Trees
Additive Models,Trees,and Related Models
The Elements of Statistical Learning
Machine learning, pattern recognition and statistical data modelling
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ying shen Sse, tongji university Sep. 2016
1/18/2019 ST3131, Lecture 1.
Generally Discriminant Analysis
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
Sparse Principal Component Analysis
Linear Discrimination
Presentation transcript:

Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University

Introduction 9.1: Generalized Additive Models 9.2: Tree-Based Methods 9.3: PRIM:Bump Hunting 9.4: MARS: Multivariate Adaptive Regression Splines 9.5: HME:Hieraechical Mixture of Experts

9.1 Generalized Additive Models In the regression setting, a generalized additive models has the form: Here s are unspecified smooth and nonparametric functions. Instead of using LBE(Linear Basis Expansion) in chapter 5, we fit each function using a scatter plot smoother(e.g. a cubic smoothing spline)

GAM(cont.) For two-class classification, the additive logistic regression model is: Here

GAM(cont) In general, the conditional mean U(x) of a response y is related to an additive function of the predictors via a link function g: Examples of classical link functions: Identity: g(u)=u Logit: g(u)=log[u/(1-u)] Probit: g(    Log: g(u) = log(u)

Fitting Additive Models The additive model has the form: Here we have Given observations, a criterion like penalized sum squares can be specified for this problem: Where are tuning parameters.

FAM(cont.) Conclusions: The solution to minimize PRSS is cubic splines, however without further restrictions the solution is not unique. If 0 holds, it is easy to see that : If in addition to this restriction, the matrix of input values has full column rank, then (9.7) is a strict convex criterion and has an unique solution. If the matrix is singular, then the linear part of fj cannot be uniquely determined. (Buja 1989)

Learning GAM: Backfitting Backfitting algorithm 1.Initialize: 2.Cycle: j = 1,2,…, p,…,1,2,…, p,…, (m cycles) Until the functions change less than a prespecified threshold

Backfitting: Points to Ponder Computational Advantage? Convergence? How to choose fitting functions?

FAM(cont.)

Additive Logistic Regression 2016/2/111

Logistic Regression Model the class posterior in terms of K-1 log-odds Decision boundary is set of points Linear discriminant function for class k Classify to the class with the largest value for its  k (x)

Logistic Regression con’t Parameters estimation –Objective function –Parameters estimation IRLS (iteratively reweighted least squares) Particularly, for two-class case, using Newton-Raphson algorithm to solve the equation, the objective function:

Logistic Regression con’t

When it is used –binary responses (two classes) –As a data analysis and inference tool to understand the role of the input variables in explaining the outcome Feature selection –Find a subset of the variables that are sufficient for explaining their joint effect on the response. –One way is to repeatedly drop the least significant coefficient, and refit the model until no further terms can be dropped –Another strategy is to refit each model with one variable removed, and perform an analysis of deviance to decide which one variable to exclude Regularization –Maximum penalized likelihood –Shrinking the parameters via an L 1 constraint, imposing a margin constraint in the separable case

Additive Logistic Regression 2016/2/119

Additive Logistic Regression

Fitting logistic regressionFitting additive logistic regression Iterate: Using weighted least squares to fit a linear model to z i with weights w i, give new estimates 3. Continue step 2 until converge 1. where 2. Iterate: b. a. c.c. Using weighted backfitting algorithm to fit an additive model to z i with weights w i, give new estimates b. 3.Continue step 2 until converge Additive Logistic Regression: Backfitting

SPAM Detection via Additive Logistic Regression Input variables (predictors): –48 quantitative variables: percentage of words in the that match a given word. Examples include business, address, internet, etc. –6 quantitative variables: percentage of characters in the that match a given character, such as ‘ch;’, ch(, etc. –The average length of uninterrupted sequences of capital letters –The length of the longest uninterrupted sequence of capital letters –The sum of length of uninterrupted length of capital letters Output variable: SPAM (1) or (0) f j ’s are taken as cubic smoothing splines

2016/2/1Additive Models23

2016/2/1Additive Models24

SPAM Detection: Results True ClassPredicted Class (0)SPAM (1) (0)58.5%2.5% SPAM (1)2.7%36.2% Sensitivity: Probability of predicting spam given true state is spam = Specificity: Probability of predicting given true state is =

GAM: Summary Useful flexible extensions of linear models Backfitting algorithm is simple and modular Interpretability of the predictors (input variables) are not obscured Not suitable for very large data mining applications (why?)