Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
General Linear Model Introduction to ANOVA.
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.
Model generalization Test error Bias, variance and complexity
Model Assessment, Selection and Averaging
Chapter 2: Lasso for linear models
Psychology 202b Advanced Psychological Statistics, II February 10, 2011.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Chapter 10 Simple Regression.
Additional Topics in Regression Analysis
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.
Multiple Regression Research Methods and Statistics.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Simple Linear Regression Analysis
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis What is regression ?What is regression ? Best-fit lineBest-fit line Least squareLeast square What is regression ?What is regression.
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
MTH 161: Introduction To Statistics
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
CpSc 881: Machine Learning
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Data analysis tools Subrata Mitra and Jason Rahman.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Regularized Least-Squares and Convex Optimization.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Basic Estimation Techniques
3.1 Examples of Demand Functions
Boosting and Additive Trees (2)
Generalized regression techniques
Kin 304 Regression Linear Regression Least Sum of Squares
CSE 4705 Artificial Intelligence
CH 5: Multivariate Methods
Multiple Regression.
AMIA Joint Summits 2017 San Francisco
BPK 304W Regression Linear Regression Least Sum of Squares
Quantitative Methods Simple Regression.
BIVARIATE REGRESSION AND CORRELATION
BPK 304W Correlation.
Roberto Battiti, Mauro Brunato
Regression Analysis PhD Course.
Lasso/LARS summary Nasimeh Asgarian.
Prediction of new observations
Linear Model Selection and regularization
Pattern Recognition and Machine Learning
Sparse Principal Component Analysis
Regression and Correlation of Data
Structural Equation Modeling
Presentation transcript:

Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia

Table of Contents What is LASSO? How does LASSO Work? LASSO and Feature Selection LGL Leukemia Statistical Biomarker Discovery Methods and Results Questions

What is LASSO? LASSO = Least Absolute Shrinkage and Selection Operator Developed by Robert Tibshirani in 1996 LASSO is a method of feature selection

What is LASSO? Estimates regression coefficients  i for each feature x i Uses a penalty function via a tuning parameter Sets coefficients of less relevant features to zero

How Does LASSO Work? Regression Equation: ŷ =  0 +  1 x 1 +  2 x 2 + … +  n x n x 1, x 2,..., x n are the variables/features ŷ is the predicted outcome

How Does LASSO Work? y 1 =  0 +  1 x 11 +  2 x 12 + … +  n x 1n y 2 =  0 +  1 x 21 +  2 x 22 + … +  n x 2n. y m =  0 +  1 x m1 +  2 x m2 + … +  n x mn

How Does LASSO Work?

y 1 =  0 +  1 x 11 +  2 x 12 + … +  n x 1n +  1 y 2 =  0 +  1 x 21 +  2 x 22 + … +  n x 2n +  2. y m =  0 +  1 x m1 +  2 x m2 + … +  n x mn +  m

HOW DOES LASSO WORK? GOAL: find  0,  1, …,  n that minimize the square of the total prediction error (      m  

HOW DOES LASSO WORK? GOAL: find  0,  1, …,  n that minimize the square of the total prediction error

HOW DOES LASSO WORK? GOAL: find  0,  1, …,  n that minimize the square of the total prediction error

How Does LASSO Work? Presence of dependent variables (x i ) leads to regression coefficients (  i ) with very large variances Tuning parameter  used to restrict the regression coefficients  0 +  1 + … +  n ≤ c

How Does LASSO Work? -c c c

LASSO and Feature Selection Use of drives less relevant  i to zero LASSO can be used to filter features that contribute less to the expected result ŷ =  0 +  1 x 1 +  2 x 2 +  3 x 3 +  4 x 4

LASSO and Feature Selection Use of drives less relevant  i to zero LASSO can be used to filter features that contribute less to the expected result = 0.5 ŷ =  0 +  x 1 +  2 x 2 +  x 3 +  4 x 4

LASSO and Feature Selection LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease = 0.5 ŷ =  0 +  x 1 +  2 x 2 +  x 3 +  4 x 4 x i is the transcription level of gene i ŷ is the presence or absence of disease

LGL Leukemia LGL = large granular lymphocytic Results from lack of programmed cell death No current standard treatment

Statistical Biomarker Discovery Other methods of biomarker detection select genes based on biomedical perspectives Proposed method uses a purely statistical approach Results need to be verified via further biomedical studies

Methods and Results sample of 45 subjects with attributes 37 infected / 8 normal y = 0 for normal / 1 for infected sample data standardized based on z score combination of heat map visualization and LASSO

Methods and Results

Testing set contains one sample Leave-one-out cross validation used to choose optimal  Authors choose  that results in the most shrinkage with a mean squared error within one standard error of the minimum =

Methods and Results

21 genes selected from LASSO method "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"

Methods and Results Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes One gene shows potential as LGL leukemia biomarker

Questions