6. Kernel Regression.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Computer vision: models, learning and inference Chapter 8 Regression.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Model assessment and cross-validation - overview
High Throughput Computing and Protein Structure Stephen E. Hamby.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
SVM QP & Midterm Review Rob Hall 10/14/ This Recitation Review of Lagrange multipliers (basic undergrad calculus) Getting to the dual for a QP.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Statistical Tools for Environmental Problems NRCSE.
An Introduction to Support Vector Machines Martin Law.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
An Introduction to Support Vector Machines (M. Law)
Christopher M. Bishop, Pattern Recognition and Machine Learning.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
INTRODUCTION TO Machine Learning 3rd Edition
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Feature (Gene) Selection MethodsSample Classification Methods Gene filtering: Variance (SD/Mean) Principal Component Analysis Regression using variable.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
VISG – LARGE DATASETS Literature Review Introduction – Genome Wide Selection Aka Genomic Selection Set of Markers 10,000’s - enough to capture most genetic.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Seefried, F., von Rohr P., Drögemüller C.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning – Classification David Fenyő
Dimensionality reduction
Lecture 28: Bayesian Tools
Sparse Kernel Machines
Boosting and Additive Trees (2)
Ch8: Nonparametric Methods
Dimensionality reduction
Workshop on Methods for Genomic Selection (El Batán, July 15, 2013) Paulino Pérez & Gustavo de los Campos.
Machine learning, pattern recognition and statistical data modelling
Regularized risk minimization
Chapter 12: Regression Diagnostics
Lecture 09: Gaussian Processes
Overview of Supervised Learning
Feature Engineering Studio Special Session
Spatial Analysis Longley et al..
Filtering and State Estimation: Basic Concepts
I. TOPICS WE INTEND TO COVER
Presented by Nagesh Adluru
What are BLUP? and why they are useful?
Biointelligence Laboratory, Seoul National University
Welcome to the Kernel-Club
Lecture 10: Gaussian Processes
Comparison of the csEN algorithm to existing predictive methods and model reduction. Comparison of the csEN algorithm to existing predictive methods and.
Model generalization Brief summary of methods
Mathematical Foundations of BME
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

6. Kernel Regression

Framework Phenotype Genetic Value Model Residual Ridge Regression / LASSO Bayes A, Bayes B, Bayesian LASSO … - Linear model: - Reproducing Kernel Hilbert Spaces Regression Neural Networks … - Semi-parametric models:

RKHS Regressions (Background) Uses: Scatter-plot smoothing (Smoothing Splines) [1] Spatial smoothing (‘Kriging’) [2] Classification problems (Support vector machines) [3] Animal model … Regression setting (it can be of any nature) unknown function [1] Wahba (1990) Spline Models for Observational Data. [2] Cressie, N. (1993) Statistics for Spatial Data. [3] Vapnik, V. (1998) Statistical Learning Theory.

RKHS Regressions (Background) Non-parametric representation of functions Reproducing Kernel: Must be positive (semi) definite: Defines a correlation function: Defines a RKHS of real-valued functions [1] [1] Aronszajn, N. (1950) Theory of reproducing kernels

Functions as Gaussian processes K=A => Animal Model [1] [1] de los Campos Gianola and Rosa (2008) Journal of Animal Sci.

RKHS Regression in BGLR1 ETA<-list( list(K=K,model='RKHS') ) fm<-BGLR(y=y,ETA=ETA,nIter=...) [1]: the algorithm is described in de los Campos et al. Genetics Research (2010)

Choosing the RK based on predictive ability Strategies Grid of Values of ө + CV Fully Bayesian: assign a prior to ө (computationally demanding) Kernel Averaging [1] [1] de los Campos et al. (2010) WCGALP & Genetics Research (In press)

Histograms of the off-diagonal entries of each of the three t kernels used (K1, K2, K3) in the RKHS models for the wheat dataset

How to Choose the Reproducing Kernel? [1] Pedigree-models K=A Genomic Models: - Marker-based kinship - Model-derived Kernel Predictive Approach Explore a wide variety of kernels => Cross-validation => Bayesian methods [1] Shawne-Taylor and Cristianini (2004)

Example 2

Example 2

Example 2

Example 2

Example 3 Kernel Averaging

Kernel Averaging Strategies Grid of Values of  + CV Fully Bayesian: assign a prior to  (computationally demanding) Kernel Averaging [1] [1] de los Campos et al., Genetics Research (2010)

Kernel Averaging

Example 4 (100th basis function)

Example 4 (100th basis function, h=)

Example 4 (KA: trace plot residual variance)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: trace plot kernel-variances)

Example 4 (KA: prediction accuracy)