1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.

1 Kernel based data fusion Discussion of a Paper by G. Lanckriet

2 Paper

3 Overview Problem: Aggregation of heterogeneous data Idea: Different data are represented by different kernels Question: How to combine different kernels in an elegant/efficient way? Solution: Linear combination and SDP Application: Recognition of ribosomal and membrane proteins

4 Linear combination of kernels weightkernel  Resulting kernel K is positive definite (x T Kx > 0 for x, provided  i > 0 and x T K i x > 0 )  Elegant aggregation of heterogeneous data  More efficient than training of individual SVMs  KCCA uses unweighted sum over individual kernels x T Kx = x 2 K x x2Kx2K 0

5 Support Vector Machine slack variables square norm vector penalty term Hyperplane

6 Dual form Lagrange multipliers quadratic, convex  Maximization instead of minimization  Equality constraints  Lagrange multipliers  instead of w,b,  Quadratic program (QP) positive definite scalar  0

7 Inserting linear combination Combined kernel must be within the cone of positive semidefinite matrices Fixed trace, avoids trivial solution ugly

8 Cone and other stuff http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html The set of all symmetric positive semidefinite matrices of particular dimension is called the positive semidefinite cone. x T Ax ≥ 0, x A Positive semidefinite: Positive semidefinite cone:

9 Semidefinite program (SDP) positive semidefinite constraints Fixed trace, avoids trivial solution

10 Dual form  Quadratically constraint quadratic program (QCQP)  QCQPs can be solved more efficiently than SDPs (O(n 3 ) O(n 4.5 ))  Interior point methods quadratic constraint

11 Interior point algorithm Linear program: maximize c T x subject to Ax < b x ≥ 0  Classical Simplex method follows edges of polyhedron  Interior point methods walk through the interior of the feasible region

12 Application  Recognition of ribosomal and membrane proteins in yeast  3 Types of data Amino acid sequences Protein protein interactions mRNA expression profiles  7 Kernels Empirical kernel map -> sequence homology  BLAST(B), Smith-Waterman(SW), Pfam FFT -> sequence hydropathy  KD hydropathy profiles, padding, low-pass filter, FFT, RBF Interaction kernel(LI) -> PPI Diffusion(D) -> PPI RBF(E) -> gene expression

13 Results  Combination of kernels performs better than individual kernels  Gene expression (E) most important for ribosomal protein recognition  PPI (D) most important for membrane protein recognition

14 Results  Small improvement compared to weights = 1  SDP robust in the presence of noise  How performs SDP versus kernel weights derived from accuracy of individual SVMs?  Membrane protein recognition Other methods use sequence information only TMHMM designed for topology prediction TMHMM not trained on yeast only

15 Why is this cool? Everything you ever dreamed of:  Optimization of C included (2-norm soft margin SVM =1/C)  Hyperkernels (optimize the kernel itself)  Transduction (learn from labeled & unlabeled samples in polynomial time)  SDP has many applications (Graph theory, combinatorial optimization, …)

16 Literature  Learning the kernel matrix with semidefinite programming G.R.G.Lanckrit et. al, 2004  Kernel-based data fusion and its application to protein function prediction in yeast G.R.G.Lanckrit et. al, 2004  Machine learning using Hyperkernels C.S.Ong, A.J.Smola, 2003  Semidefinite optimization M.J.Todd, 2001  http://www-user.tu-chemnitz.de/~helmberg/semidef.html

17 Software  SeDuMi (SDP)  Mosek (QCQP, Java,C++, commercial)  YALMIP (Matlab) … http://www-user.tu-chemnitz.de/~helmberg/semidef.html

1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.

Similar presentations

Presentation on theme: "1 Kernel based data fusion Discussion of a Paper by G. Lanckriet."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.

Similar presentations

Presentation on theme: "1 Kernel based data fusion Discussion of a Paper by G. Lanckriet."— Presentation transcript:

Similar presentations

About project

Feedback