6. Kernel Regression.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.

Computer vision: models, learning and inference Chapter 8 Regression.

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct

Model assessment and cross-validation - overview

High Throughput Computing and Protein Structure Stephen E. Hamby.

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

SVM QP & Midterm Review Rob Hall 10/14/ This Recitation Review of Lagrange multipliers (basic undergrad calculus) Getting to the dual for a QP.

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Modeling Gene Interactions in Disease CS 686 Bioinformatics.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Statistical Tools for Environmental Problems NRCSE.

An Introduction to Support Vector Machines Martin Law.

Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

An Introduction to Support Vector Machines (M. Law)

Christopher M. Bishop, Pattern Recognition and Machine Learning.

REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.

INTRODUCTION TO Machine Learning 3rd Edition

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Feature (Gene) Selection MethodsSample Classification Methods Gene filtering: Variance (SD/Mean) Principal Component Analysis Regression using variable.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.

VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.

Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.

Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

Seefried, F., von Rohr P., Drögemüller C.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning – Classification David Fenyő

Dimensionality reduction

Lecture 28: Bayesian Tools

Sparse Kernel Machines

Boosting and Additive Trees (2)

Ch8: Nonparametric Methods

Dimensionality reduction

Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.

Machine learning, pattern recognition and statistical data modelling

Regularized risk minimization

Chapter 12: Regression Diagnostics

Lecture 09: Gaussian Processes

Overview of Supervised Learning

Feature Engineering Studio Special Session

Spatial Analysis Longley et al..

Filtering and State Estimation: Basic Concepts

I. TOPICS WE INTEND TO COVER

Presented by Nagesh Adluru

What are BLUP? and why they are useful?

Biointelligence Laboratory, Seoul National University

Welcome to the Kernel-Club

Lecture 10: Gaussian Processes

Comparison of the csEN algorithm to existing predictive methods and model reduction. Comparison of the csEN algorithm to existing predictive methods and.

Model generalization Brief summary of methods

Mathematical Foundations of BME

Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

Presentation transcript:

6. Kernel Regression

Framework Phenotype Genetic Value Model Residual Ridge Regression / LASSO Bayes A, Bayes B, Bayesian LASSO … - Linear model: - Reproducing Kernel Hilbert Spaces Regression Neural Networks … - Semi-parametric models:

RKHS Regressions (Background) Uses: Scatter-plot smoothing (Smoothing Splines) [1] Spatial smoothing (‘Kriging’) [2] Classification problems (Support vector machines) [3] Animal model … Regression setting (it can be of any nature) unknown function [1] Wahba (1990) Spline Models for Observational Data. [2] Cressie, N. (1993) Statistics for Spatial Data. [3] Vapnik, V. (1998) Statistical Learning Theory.

RKHS Regressions (Background) Non-parametric representation of functions Reproducing Kernel: Must be positive (semi) definite: Defines a correlation function: Defines a RKHS of real-valued functions [1] [1] Aronszajn, N. (1950) Theory of reproducing kernels

Functions as Gaussian processes K=A => Animal Model [1] [1] de los Campos Gianola and Rosa (2008) Journal of Animal Sci.

RKHS Regression in BGLR1 ETA<-list( list(K=K,model='RKHS') ) fm<-BGLR(y=y,ETA=ETA,nIter=...) [1]: the algorithm is described in de los Campos et al. Genetics Research (2010)

Choosing the RK based on predictive ability Strategies Grid of Values of ө + CV Fully Bayesian: assign a prior to ө (computationally demanding) Kernel Averaging [1] [1] de los Campos et al. (2010) WCGALP & Genetics Research (In press)

Histograms of the off-diagonal entries of each of the three t kernels used (K1, K2, K3) in the RKHS models for the wheat dataset

How to Choose the Reproducing Kernel? [1] Pedigree-models K=A Genomic Models: - Marker-based kinship - Model-derived Kernel Predictive Approach Explore a wide variety of kernels => Cross-validation => Bayesian methods [1] Shawne-Taylor and Cristianini (2004)

Example 2

Example 2

Example 2

Example 2

Example 3 Kernel Averaging

Kernel Averaging Strategies Grid of Values of  + CV Fully Bayesian: assign a prior to  (computationally demanding) Kernel Averaging [1] [1] de los Campos et al., Genetics Research (2010)

Kernel Averaging

Example 4 (100th basis function)

Example 4 (100th basis function, h=)

Example 4 (KA: trace plot residual variance)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: prediction accuracy)