1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Inference for Regression

Model assessment and cross-validation - overview

CMPUT 466/551 Principal Source: CMU

Practical Sheet 6 Solutions Practical Sheet 6 Solutions The R data frame “whiteside” which deals with gas consumption is made available in R by > data(whiteside,

Basic geostatistics Austin Troy.

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

統計計算與模擬政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日第十六週：估計密度函數

x – independent variable (input)

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Curve-Fitting Regression

Sparse Kernels Methods Steve Gunn.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Data mining and statistical learning - lecture 13 Separating hyperplane.

1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

Part I: Classification and Bayesian Learning

Classification and Prediction: Regression Analysis

Relationships Among Variables

An Introduction to Support Vector Machines Martin Law.

Objectives of Multiple Regression

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Inference for regression - Simple linear regression

PATTERN RECOGNITION AND MACHINE LEARNING

Outline Separating Hyperplanes – Separable Case

Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Machine Learning CSE 681 CH2 - Supervised Learning.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

An Introduction to Support Vector Machines (M. Law)

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.

Machine Learning 5. Parametric Methods.

Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Chapter 7: The Distribution of Sample Means

CORRELATION-REGULATION ANALYSIS Томский политехнический университет.

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 12a, April 19, 2016 Cross-validation, Revisiting Regression – local models, and non-parametric…

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Estimating standard error using bootstrap

Chapter 7. Classification and Prediction

CHAPTER 3 Describing Relationships

Ch8: Nonparametric Methods

CHAPTER 3 Describing Relationships

Least-Squares Regression

What is Regression Analysis?

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Least-Squares Regression

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

政治大學統計系余清祥 2004年5月26日~ 6月7日第十六、十七週：估計密度函數

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships

Presentation transcript:

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 12a, April 21, 2015 Revisiting Regression – local models, and non-parametric…

Why local? 2

Sparse? 3

Remember this one? 4 How would you apply local methods here?

SVM-type One-class-classification: this model tries to find the support of a distribution and thus allows for outlier/novelty detection; epsilon-regression: here, the data points lie in between the two borders of the margin which is maximized under suitable conditions to avoid outlier inclusion; nu-regression: with analogue modifications of the regression model as in the classification case. 5

Reminder SVM and margin 6

Loss functions… 7 classification outlier regression

Regression By using a different loss function called the ε- insensitive loss function ||y−f (x)||ε = max{0, ||y− f(x)|| − ε}, SVMs can also perform regression. This loss function ignores errors that are smaller than a certain threshold ε > 0 thus creating a tube around the true output. 8

Example lm v. svm 9

10

Again SVM in R E the svm() function in e1071 provides a rigid interface to libsvm along with visualization and parameter tuning methods. kernlab features a variety of kernel-based methods and includes a SVM method based on the optimizers used in libsvm and bsvm Package klaR includes an interface to SVMlight, a popular SVM implementation that additionally offers classification tools such as Regularized Discriminant Analysis. Svmpath – you get the idea… 11

Knn is local – right? nearest neighbors is a simple algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique. 12

Distance… A simple implementation of KNN regression is to calculate the average of the numerical target of the K nearest neighbors. Another approach uses an inverse distance weighted average of the K nearest neighbors. Choosing K! KNN regression uses the same distance functions as KNN classification. knn.reg and also in kknn project.org/web/packages/kknn/kknn.pdfhttp://cran.r- project.org/web/packages/kknn/kknn.pdf 13

Classes of local regression Locally (weighted) scatterplot smoothing –LOESS –LOWESS Fitting is done locally - the fit at point x, the fit is made using points in a neighborhood of x, weighted by their distance from x (with differences in ‘parametric’ variables being ignored when computing the distance) 14

15

Classes of local regression The size of the neighborhood is controlled by α (set by span). For α 1, all points are used, with the ‘maximum distance’ assumed to be α^(1/p) times the actual maximum distance for p explanatory variables. 16

Classes of local regression For the default family, fitting is by (weighted) least squares. For family="symmetric" a few iterations of an M-estimation procedure with Tukey's biweight are used. Be aware that as the initial value is the least- squares fit, this need not be a very resistant fit. It can be important to tune the control list to achieve acceptable speed. 17

Friedman (supsmu in modreg) is a running lines smoother which chooses between three spans for the lines. The running lines smoothers are symmetric, with k/2 data points each side of the predicted point, and values of k as 0.5 * n, 0.2 * n and 0.05 * n, where n is the number of data points. If span is specified, a single smoother with span span * n is used. 18

Friedman The best of the three smoothers is chosen by cross-validation for each prediction. The best spans are then smoothed by a running lines smoother and the final prediction chosen by linear interpolation. “For small samples (n 0) should be used. Reasonable span values are 0.2 to 0.4.” 19

Local non-param lplm (in Rearrangement) Local nonparametric method, local linear regression estimator with box kernel (default), for conditional mean functions 20

Ridge regression Addresses ill-posed regression problems using filtering approaches (e.g. high-pass) Often called “regularization” lm.ridge (in MASS) 21

Quantile regression –is desired if conditional quantile functions are of interest. One advantage of quantile regression, relative to the ordinary least squares regression, is that the quantile regression estimates are more robust against outliers in the response measurements. –In practice we often prefer using different measures of central tendency and statistical dispersion to obtain a more comprehensive analysis of the relationship between variables. 22

More… Partial Least Squares Regression (PLSR) mvr (in pls) Principal Component Regression (PCR) Canonical Powered Partial Least Squares (CPPLS) 23

PCR creates components to explain the observed variability in the predictor variables, without considering the response variable at all. On the other hand, PLSR does take the response variable into account, and therefore often leads to models that are able to fit the response variable with fewer components. Whether or not that ultimately translates into a better model, in terms of its practical use, depends on the context. 24

Splines smooth.spline, splinefun (stats, modreg) and ns (in splines) – a numeric function that is piecewise-defined by polynomial functions, and which possesses a sufficiently high degree of smoothness at the places where the polynomial pieces connect (which are known as knots) 25

Splines For interpolation, splines are often preferred to polynomial interpolation - they yields similar results to interpolating with higher degree polynomials while avoiding instability due to overfitting Features: simplicity of their construction, their ease and accuracy of evaluation, and their capacity to approximate complex shapes Most common: cubic spline, i.e., of order 3— in particular, cubic B-spline 26

cars 27

Smoothing/ local … s/html/library/modreg/html/00Index.htmlhttps://web.njit.edu/all_topics/Prog_Lang_Doc s/html/library/modreg/html/00Index.html refcard-regression.pdfhttp://cran.r-project.org/doc/contrib/Ricci- refcard-regression.pdf 28