Kernel Methods for large-scale Genomics Data Analysis

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Support Vector Machines

SVM—Support Vector Machines

Support vector machine

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.

An Introduction to Support Vector Machines Martin Law.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

An Introduction to Support Vector Machines (M. Law)

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.

Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文奈良原.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Regression Usman Roshan.

Support Vector Machines

SNPs and complex traits: where is the hidden heritability?

CS 9633 Machine Learning Support Vector Machines

PREDICT 422: Practical Machine Learning

Chapter 7. Classification and Prediction

Disease risk prediction

Constrained Hidden Markov Models for Population-based Haplotyping

Results for all features Results for the reduced set of features

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Research in Computational Molecular Biology , Vol (2008)

Support Vector Machines

Machine Learning Basics

Evaluating classifiers for disease gene discovery

An Introduction to Support Vector Machines

Support Vector Machines Introduction to Data Mining, 2nd Edition by

CS 2750: Machine Learning Support Vector Machines

Beyond GWAS Erik Fransen.

COSC 4335: Other Classification Techniques

Support Vector Machines

Regression Usman Roshan.

Ensemble learning Reminder - Bagging of Trees Random Forest

Mathematical Foundations of BME Reza Shadmehr

COSC 4368 Machine Learning Organization

Machine Learning Support Vector Machine Supervised Learning

Linear Discrimination

SVMs for Document Ranking

Modeling IDS using hybrid intelligent systems

Support Vector Machines 2

Presentation transcript:

Kernel Methods for large-scale Genomics Data Analysis Wang et al.

Background Machine learning (ML) has been illustrated as a promising tool to deal with challenges regarding data growth in genomics ML methods can be used to learn how a very large number of genetic variants (SNPs) are associated with complex phenotypes (diseases, disorders etc.) This study highlights potential roles that ML, particularly kernel methods, will have in modern genomics

Kernels for Genomic Data Kernel methods are based on mathematical functions that smooth data They allow us to use linear classifiers to solve non-linear problems by transforming the non-linearly separable data

Kernels for Genomic Data contd. Some advantages to kernel methods over traditional regression methods are the following: Allowance for high-dimensional genomic data Allowance for nonlinear relations between outcomes and the genomic data Flexibility to include structural information

Kernels for Genomic Data contd. A key component for a kernel is a kernel function The function converts info for a pair of subjects into a quantitative measure representing their similarity with respect to genetic data For GWAS studies, the weighted linear kernel is popular

Kernels for Genomic Data contd. For the weighted kernel function, SNPs are coded as G and G has values 0, 1, or 2 based on the number of the minor allele essentially encoding homozygous or heterozygous For q SNPs, the weighted function for subjects i and j can be expressed as: 𝐾 𝑖𝑗 = 𝑘=1 𝑞 𝑤 𝑘 𝐺 𝑖𝑘 𝐺 𝑗𝑘

Kernels for Genomic Data contd. 𝑤 𝑘 weights each SNP and is expressed as the standard error of the estimated minor allele frequency (MAF): 𝑤 𝑘 =1/ 𝑝 𝑘 (1− 𝑝 𝑘 ) Other types of weights can be used and higher-order polynomial functions can be used for higher-order interactions Ultimately, other types of kernels besides the weighted kernel can also be used Since common alleles can be carried by many subjects (by chance alone) then giving greater weights to sharing rare variants can increase the strength of the relationship between the kernel matrix and the phenotype Other types of weights can be used and higher-order polynomial functions can be used to capture higher-order genetic interactions (interactions involving 3 or more markers contributing to complex traits i.e. diseases) Define the kernel however you see “similarity” fits your application

Building Predictive Models The goal is to be able to predict phenotypes for different individuals based on known genomic data (supervised ML) A common practice is to build the prediction model based on top-ranked markers from GWAS and a few experimentally known susceptibility markers (“cherry picking”) This so-called “cherry picking” strategy shows poor performance in most cases due to the fact that top genetic variants only explain a small amount of phenotype variation and genetic studies suffer from low replicability BUT this is how people who do GWAS studies have gotten around the scalability issues

Building Predictive Models contd. Another strategy is to train the model using all the available markers as well as all other available information such as epigenetic markers Another strategy is to train the model using all the available markers as well as all other available information such as epigenetic markers (markers that characterize phenotypes that are not dictated by DNA) In this paper, an efficient kernel method solution is shown that uses this strategy efficiently by using feature selection/weighting as well as prediction in a unified framework

Building Predictive Models contd. For disease risk prediction, support vector machines (SVMs) may be used. SVMs are a well-developed method seeking an optimal hyperplane that separates the data into 2 classes maximizing the margin How to make use of SVM with non-linearly separable data?? NEXT SLIDE

Kernel Trick Nonlinear classification is attained by using the kernel trick: mapping the non-linear separable data-set into a higher dimensional space where we can find a hyperplane that can separate the samples

Building Predictive Models contd. SVMs are advantageous for high-dimensional genomic data: Ability to deal with all markers without any pre-pruning or selection Accounts for complex relationships amongst markers However, SVMs are black-box approaches that only provide classification and it is difficult to extract more information Its application in genomic prediction is very limited despite promising results from various studies

Building Predictive Models contd. Another potential classifier is kernel logistic regression (KLR) KLR offers a natural estimate of probability and adapts to other probabilistic approaches The hinge losses of KLR and SVM are actually very similar and the methods have similar expected performance but the significant differences lie in their applications Naturally, it is a “kernelized” version of logistic regression that offers more desirable features than SVMs It also is easy to extend to multiclass prediction However, the original KLR does not scale well with large data sets. New fast and sparse-driven versions have been developed but not yet seen with genomic prediction

Building Predictive Models contd. Many strategies can be adopted to improve whole-genome risk prediction One strategy is exploiting block structure underlying genomic data Using kernel based methods does not necessarily drastically improve the prediction Using kernel based methods does not necessarily drastically improve the prediction because performance in real data analysis is always limited by sample size and content embedded in the data

The genomic data space can be seen as 3 layers of space: the original data (X), the transformed feature space made by the kernel functions and finally the reduced kernel space made by kernel approximation A lot of the methods in the paper efficiently explore and fit models in the H space (kernel counterparts of many well known ML methods) The figure shows the type of applicable ML methods used in each data space The figure also shows that we take the genomic data as well as the underlying structure as input

Multiple Kernel Learning (MKL) Instead of selecting a fixed single kernel, multiple kernel learning (MKL) uses multiple candidate kernels to map the data into the other space MKL achieves better performance by finding optimal weights for each base kernel

Multiple Kernel Learning (MKL) contd. MKL seeks to make a composite kernel as a linear combination of different kernels where model complexity is controlled by regularization Applications of MKL in genomic data are currently limited but will increase alongside large-scale genomics data

Genomic Data Fusion A closely related concept to MKL is kernel-based data fusion Both data fusion and MKL are facilitated by the closure property of kernels (sum or weighted sum of kernels is another valid kernel) Kernel fusion methods allow for integration of data with different types (gene expression, DNA methylation, CNV etc.) and structures in function prediction These methods as well as other ML strategies provide novel tools for gene function prediction and annotation Kernel matrices generated from different data combining into one global kernel Data fusion here has been defined in a broad sense but other studies have been done outline its applications in bioinformatic settings (unsupervised as well as supervised methodology)

Structured Association Mapping Leveraging structural information unlike traditional test- statistics-based or PCA-based methods ( 𝑆 2 𝑀 2 𝑅) Using various structural information present in the genome (phenome and transcriptome) to improve accuracy of identifying causal variants Reference the kernel diagram again

Structured Association Mapping contd. An important source of genome structural info is genome annotations (known binding sites, exon regions etc.) This data can be considered as prior knowledge about SNPs to be used to search for disease susceptibility markers For example, SNPs in highly conserved regions are more likely to have true associations since conserved regions are functionally important How to best use this prior knowledge will become increasingly important as genomic annotations improve in quality and quantity

Discussion Kernel machine learning methods offer potential tools for large-scale and high-dimensional data analysis, for genomics in particular Kernel ML methods can be integrated with classical ML techniques Future work involves improving the scalability to sample size, dimensionality and data heterogeneity We know that the kernel trick allows efficient search in the higher dimension; however, a main limitation of kernel methods is high cost involved in the learning which is at least quadratic in the number of samples which with genomic data growing calls for additional research in approximation methods What I mean in terms of data heterogeneity is that its difficult to predefine an optimal kernel function for a specific application given the complex data structure and types in genomics which is why MKL and ensemble type learning schemes should be more considered

Reference Wang, Xuefeng, Eric P. Xing, and Daniel J. Schaid. “Kernel Methods for Large-Scale Genomic Data Analysis.” Briefings in Bioinformatics 16, no. 2 (March 2015): 183–92. https://doi.org/10.1093/bib/bbu024.