Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Regularized risk minimization
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
CMPUT 466/551 Principal Source: CMU
The General Linear Model. The Simple Linear Model Linear Regression.
Chapter 4: Linear Models for Classification
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Visual Recognition Tutorial
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and risk prediction
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Review of Lecture Two Linear Regression Normal Equation
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
INTRODUCTION TO Machine Learning 3rd Edition
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Logistic Regression: Regression with a Binary Dependent Variable.
Data Modeling Patrice Koehl Department of Biological Sciences
Applied statistics Usman Roshan.
Regression Usman Roshan.
Chapter 7. Classification and Prediction
Usman Roshan CS 675 Machine Learning
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Empirical risk minimization
Ch3: Model Building through Regression
Probabilistic Models for Linear Regression
Roberto Battiti, Mauro Brunato
Ying shen Sse, tongji university Sep. 2016
OVERVIEW OF LINEAR MODELS
Biointelligence Laboratory, Seoul National University
OVERVIEW OF LINEAR MODELS
Regression Usman Roshan.
Parametric Methods Berlin Chen, 2005 References:
Empirical risk minimization
Machine learning overview
Presentation transcript:

Regression Usman Roshan CS 675 Machine Learning

Regression Same problem as classification except that the target variable y i is continuous. Popular solutions – Linear regression (perceptron) – Support vector regression – Logistic regression (for regression)

Linear regression Suppose target values are generated by a function y i = f(x i ) + e i We will estimate f(x i ) by g(x i,θ). Suppose each e i is being generated by a Gaussian distribution with 0 mean and σ 2 variance (same variance for all e i ). This implies that the probability of y i given the input x i and variables θ (denoted as p(y i |x i,θ) is normally distributed with mean g(x i,θ) and variance σ 2.

Linear regression Apply maximum likelihood to estimate g(x, θ) Assume each (x i,y i ) i.i.d. Then probability of data given model (likelihood) is P(X|θ) = p(x 1,y 1 )p(x 2,y 2 )…p(x n,y n ) Each p(x i,y i )=p(y i |x i )p(x i ) p(y i |x i ) is normally distributed with mean g(x i,θ) and variance σ 2

Linear regression Maximizing the log likelihood (like for classification) gives us least squares (linear regression) From Alpaydin 2010

Ridge regression Regularized linear regression also known as ridge regression Minimize: Has been used in statistics for a long time to address singularity problems in linear regression. Linear regression (or least squares) solution is also given (X T X) -1 X T y Ridge regression is given by (X T X+λI) -1 X T y

Logistic regression Similar to linear regression derivation Here we predict with the sigmoid function instead of a linear function We still minimize sum of squares between predicted and actual value Output y i is constrained in the range [0,1]

Support vector regression Makes no assumptions about probability distribution of the data and output (like support vector machine). Change the loss function in the support vector machine problem to the e-sensitive loss to obtain support vector regression

Support vector regression Solved by applying Lagrange multipliers like in SVM Solution w is given by a linear combination of support vectors (like in SVM) The solution w can also be used for ranking features. From a risk minimization perspective the loss would be From a regularized perspective it is

Applications Prediction of continuous phenotypes in mice from genotype (Predicting unobserved phen…)Predicting unobserved phen Data are vectors x i where each feature takes on values 0, 1, and 2 to denote number of alleles of a particular single nucleotide polymorphism (SNP) Data has about 1500 samples and 12,000 SNPs Output y i is a phenotype value. For example coat color (represented by integers), chemical levels in blood

Mouse phenotype prediction from genotype Rank SNPs by Wald test – First perform linear regression y = wx + w 0 – Calculate p-value on w using t-test t-test: (w-w null )/stderr(w)) w null = 0 T-test: w/stderr(w) stderr(w) given by Σ i (y i -wx i -w 0 ) 2 /(x i -mean(x i )) – Rank SNPs by p-values – OR by Σ i (y i -wx i -w 0 ) Rank SNPs by Pearson correlation coefficient Rank SNPs by support vector regression (w vector in SVR) Rank SNPs by ridge regression (w vector) Run SVR and ridge regression on top k ranked SNP under cross-validation.

CD8 phenotype in mice From Al-jouie and Roshan, ICMLA workshop, 2015

MCH phenotype in mice

Fly startle response time prediction from genotype Same experimental study as previously Using whole genome sequence data to predict quantitative trait phenotypes in Drosophila Melanogaster Using whole genome sequence data to predict quantitative trait phenotypes in Drosophila Melanogaster Data has 155 samples and 2 million SNPs (features)

Fly startle response time

Rice phenotype prediction from genotype Same experimental study as previously Improving the Accuracy of Whole Genome Prediction for Complex Traits Using the Results of Genome Wide Association Studies Improving the Accuracy of Whole Genome Prediction for Complex Traits Using the Results of Genome Wide Association Studies Data has 413 samples and SNPs (features) Basic unbiased linear prediction (BLUP) method improved by prior SNP knowledge (given in genome-wide association studies)

Different rice phenotypes