Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia
Table of Contents What is LASSO? How does LASSO Work? LASSO and Feature Selection LGL Leukemia Statistical Biomarker Discovery Methods and Results Questions
What is LASSO? LASSO = Least Absolute Shrinkage and Selection Operator Developed by Robert Tibshirani in 1996 LASSO is a method of feature selection
What is LASSO? Estimates regression coefficients i for each feature x i Uses a penalty function via a tuning parameter Sets coefficients of less relevant features to zero
How Does LASSO Work? Regression Equation: ŷ = 0 + 1 x 1 + 2 x 2 + … + n x n x 1, x 2,..., x n are the variables/features ŷ is the predicted outcome
How Does LASSO Work? y 1 = 0 + 1 x 11 + 2 x 12 + … + n x 1n y 2 = 0 + 1 x 21 + 2 x 22 + … + n x 2n. y m = 0 + 1 x m1 + 2 x m2 + … + n x mn
How Does LASSO Work?
y 1 = 0 + 1 x 11 + 2 x 12 + … + n x 1n + 1 y 2 = 0 + 1 x 21 + 2 x 22 + … + n x 2n + 2. y m = 0 + 1 x m1 + 2 x m2 + … + n x mn + m
HOW DOES LASSO WORK? GOAL: find 0, 1, …, n that minimize the square of the total prediction error ( m
HOW DOES LASSO WORK? GOAL: find 0, 1, …, n that minimize the square of the total prediction error
HOW DOES LASSO WORK? GOAL: find 0, 1, …, n that minimize the square of the total prediction error
How Does LASSO Work? Presence of dependent variables (x i ) leads to regression coefficients ( i ) with very large variances Tuning parameter used to restrict the regression coefficients 0 + 1 + … + n ≤ c
How Does LASSO Work? -c c c
LASSO and Feature Selection Use of drives less relevant i to zero LASSO can be used to filter features that contribute less to the expected result ŷ = 0 + 1 x 1 + 2 x 2 + 3 x 3 + 4 x 4
LASSO and Feature Selection Use of drives less relevant i to zero LASSO can be used to filter features that contribute less to the expected result = 0.5 ŷ = 0 + x 1 + 2 x 2 + x 3 + 4 x 4
LASSO and Feature Selection LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease = 0.5 ŷ = 0 + x 1 + 2 x 2 + x 3 + 4 x 4 x i is the transcription level of gene i ŷ is the presence or absence of disease
LGL Leukemia LGL = large granular lymphocytic Results from lack of programmed cell death No current standard treatment
Statistical Biomarker Discovery Other methods of biomarker detection select genes based on biomedical perspectives Proposed method uses a purely statistical approach Results need to be verified via further biomedical studies
Methods and Results sample of 45 subjects with attributes 37 infected / 8 normal y = 0 for normal / 1 for infected sample data standardized based on z score combination of heat map visualization and LASSO
Methods and Results
Testing set contains one sample Leave-one-out cross validation used to choose optimal Authors choose that results in the most shrinkage with a mean squared error within one standard error of the minimum =
Methods and Results
21 genes selected from LASSO method "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"
Methods and Results Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes One gene shows potential as LGL leukemia biomarker
Questions