Download presentation
Presentation is loading. Please wait.
Published byBarry McBride Modified over 9 years ago
1
Integration II Prediction
2
Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction – Clinical prognosis
3
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... [Noble, Nat. Biotechnology, 2006]
4
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types [Noble, Nat. Biotechnology, 2006]
5
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types E.g.: a one-dimensional hyper-plane [Noble, Nat. Biotechnology, 2006]
6
SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types E.g.: a two-dimensional hyper-plane [Noble, Nat. Biotechnology, 2006]
7
SVMs Suppose that measurements are separable: there exists a hyperplane that separates two types Then there are an infinite number of separating hyperplanes Which to use? [Noble, Nat. Biotechnology, 2006]
8
SVMs Suppose that measurements are separable: there exists a hyperplane that separates two types Then there are an infinite number of separating hyperplanes Which to use? The maximum-margin hyperplane Equivalently: minimizer of [Noble, Nat. Biotechnology, 2006]
9
SVMs Which hyper-plane to use? In reality: minimizer of trade-off between 1. classification error, and 2. margin size loss penalty
10
SVMs This is the primal problem This is the dual problem
11
SVMs What is K? The kernel matrix: each entry is sample inner product one interpretation: sample similarity measurements completely described by K
12
SVMs Implication: Non-linearity is obtained by appropriately defining kernel matrix K E.g. quadratic kernel:
13
SVMs Another implication: No need for measurement vectors all that is required is similarity between samples E.g. string kernels
14
Protein Structure Prediction Protein structure Protein sequence Sequence similarity
15
Protein Structure Prediction
16
Kernel-based data fusion Core idea: use different kernels for different genomic data sources a linear combination of kernel matrices is a kernel (under certain conditions)
17
Kernel-based data fusion Kernel to use in prediction:
18
Kernel-based data fusion In general, the task is to estimate SVM function along with coefficients of the kernel matrix combination This is a type of well-studied optimization problem (semi-definite program)
19
Kernel-based data fusion
21
Same idea applied to cancer classification from expression and proteomic data
22
Kernel-based data fusion Prostate cancer dataset – 55 samples – Expression from microarray – Copy number variants Outcomes predicted: – Grade, stage, metastasis, recurrence
23
Kernel-based data fusion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.