Integration II Prediction
Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction – Clinical prognosis
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... [Noble, Nat. Biotechnology, 2006]
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types [Noble, Nat. Biotechnology, 2006]
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types E.g.: a one-dimensional hyper-plane [Noble, Nat. Biotechnology, 2006]
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types E.g.: a two-dimensional hyper-plane [Noble, Nat. Biotechnology, 2006]
SVMs Suppose that measurements are separable: there exists a hyperplane that separates two types Then there are an infinite number of separating hyperplanes Which to use? [Noble, Nat. Biotechnology, 2006]
SVMs Suppose that measurements are separable: there exists a hyperplane that separates two types Then there are an infinite number of separating hyperplanes Which to use? The maximum-margin hyperplane Equivalently: minimizer of [Noble, Nat. Biotechnology, 2006]
SVMs Which hyper-plane to use? In reality: minimizer of trade-off between 1. classification error, and 2. margin size loss penalty
SVMs This is the primal problem This is the dual problem
SVMs What is K? The kernel matrix: each entry is sample inner product one interpretation: sample similarity measurements completely described by K
SVMs Implication: Non-linearity is obtained by appropriately defining kernel matrix K E.g. quadratic kernel:
SVMs Another implication: No need for measurement vectors all that is required is similarity between samples E.g. string kernels
Protein Structure Prediction Protein structure Protein sequence Sequence similarity
Protein Structure Prediction
Kernel-based data fusion Core idea: use different kernels for different genomic data sources a linear combination of kernel matrices is a kernel (under certain conditions)
Kernel-based data fusion Kernel to use in prediction:
Kernel-based data fusion In general, the task is to estimate SVM function along with coefficients of the kernel matrix combination This is a type of well-studied optimization problem (semi-definite program)
Kernel-based data fusion
Same idea applied to cancer classification from expression and proteomic data
Kernel-based data fusion Prostate cancer dataset – 55 samples – Expression from microarray – Copy number variants Outcomes predicted: – Grade, stage, metastasis, recurrence
Kernel-based data fusion