N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
Machine learning continued Image source:
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Discriminative and generative methods for bags of features
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
MURI Meeting July 2002 Gert Lanckriet ( ) L. El Ghaoui, M. Jordan, C. Bhattacharrya, N. Cristianini, P. Bartlett.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
This week: overview on pattern recognition (related to machine learning)
A statistical framework for genomic data fusion William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University.
Support Vector Machine & Image Classification Applications
Integration II Prediction. Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Predicting protein function from heterogeneous data
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Robust Optimization and Applications in Machine Learning.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Support Vector Machines
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Presentation transcript:

N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources

Motivation Statistical machine learning –Blends statistics, computer science, signal processing, optimization –Involves solving large-scale data analysis problems autonomously in tandem with a human Challenges: –Massive scale of data sets –On-line issues –Diversity of information sources describing data

Example: web-related applications Data point = web page Sources of information about the webpage: –Content: Text Images Structure Sounds –Relation to other webpages: links  network –Users (log data): click behavior origin

Example: web-related applications Data point = web page Sources of information about the webpage: –Content: Text Images Structure Sounds –Relation to other webpages: links  network –Users (log data): click behavior origin Information in diverse (heterogeneous) formats

Example: bioinformatics mRNA expression data upstream region data (TF binding sites) protein-protein interaction data hydrophobicity data sequence data (gene, protein)

Overview Kernel methods Classification problems Kernel methods with heterogeneous information Classification with heterogeneous information (SDP) Applications in computational biology

Overview Kernel methods Classification problems Kernel methods with heterogeneous information Classification with heterogeneous information (SDP) Applications in computational biology

Kernel-based learning Linear algorithm SVM, MPM, PCA, CCA, FDA… Data Embed data x1x1 xnxn if data described by numerical vectors: embedding ~ (non-linear) transformation  non-linear versions of linear algorithms

Kernel-based learning Linear algorithm SVM, MPM, PCA, CCA, FDA… Data Embed data x1x1 xnxn embedding can be defined for non-vector data

Kernel-based learning Embed data K i j IMPLICITLY: Inner product measures similarity Property: Any symmetric positive definite matrix specifies a kernel matrix & every kernel matrix is symmetric positive definite

Kernel-based learningData Embed data x1x1 xnxn

Kernel-based learning Data Embed data Linear algorithm SVM, MPM, PCA, CCA, FDA… Kernel design Kernel algorithm K x1x1 xnxn

Kernel methods Unifying learning framework –connections to statistics, convex optimization, functional analysis –different data analysis problems can be formulated within this framework Classification Clustering Regression Dimensionality reduction Many successful applications

Kernel methods Unifying learning framework –connections to statistics, convex optimization, functional analysis –different data analysis problems can be formulated within this framework Many successful applications –hand-writing recognition –text classification –analysis of micro-array data –face detection –time series prediction

Training data: {(x i,y i )} i=1...n –x i : description i th object –y i 2 {-1,+1} : label Binary classification Problem: design a classification rule such that, given a new x, it predicts y with minimal probability of error HEART URINE DNA BLOOD SCAN HEART URINE DNA BLOOD SCAN x1x1 x2x2 y 1 = -1y 2 = +1

Find hyperplane that separates the two classes Binary classification HEART URINE DNA BLOOD SCAN x1x1 HEART URINE DNA BLOOD SCAN x2x2 Classification Rule:

Maximal margin classification Maximize margin: –Position hyperplane between two classes –Such that 2-norm distance to closest point from each class is maximized

If not linearly separable: –Allow some errors –Try to maximize margin for data points with no error Maximal margin classification

max margin min error correctly classified error slack Maximal margin classification: training algorithm

Training: convex optimization problem (QP) Dual problem: Maximal margin classification

Training: convex optimization problem (QP) Dual problem: Optimality condition: Maximal margin classification

Training: Classification rule: classify new data point x: Maximal margin classification

Training: Classification rule: classify new data point x: Maximal margin classification

Kernel-based classification Data Embed data Linear classification algorithm Support vector machine (SVM) Kernel design Kernel algorithm K x1x1 xnxn

Overview Kernel methods Classification problems Kernel methods with heterogeneous information Classification with heterogeneous information (SDP) Applications in computational biology

Kernel methods with heterogeneous info Data points: proteins Information sources: K i j

Kernel methods with heterogeneous info Data points: proteins Information sources: K

Kernel methods with heterogeneous data Proposed approach –First focus on every single source j of information individually –Extract relevant information from source j into K j –Design algorithm to learn the optimal K, by “mixing” any number of kernel matrices K j, for a given learning problem

Kernel methods with heterogeneous data 1 K 2

Proposed approach –First focus on every single source k of information individually –Extract relevant information from source j into K j –Design algorithm that learns the optimal K, by “mixing” any number of kernel matrices K j, for a given learning problem Focus on kernel design for specific types of information Flexibility Can ignore information irrelevant for learning task Homogeneous, standardized input 1 2

Data matrix: –each row corresponds to a gene (data point) –each column corresponds to an experiment (mRNA expression level) Each gene: described by vector of numbers Kernel design: classical vector data

Inner product : Normalized inner product : Similar Dissimilar Kernel design: classical vector data

A more advanced similarity measurement for vector data: Gaussian kernel Corresponds to highly non-linear embedding Kernel design: classical vector data

>ICYA_MANSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYMENSHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALE KFDKALKALPMHIRLSFNPTQLEEQCHI Data points: proteins Described by variable-length, discrete strings (amino acid sequences) Kernel design: derive valid similarity measure, based on non-vector information Kernel design: strings protein 1 protein 2

>ICYA_MANSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKALVLDTDVSNGVKEYMENSLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVKYVNTFKEALE KFDKALKALPMHIRLSFNPTQLEEQCHI Kernel design: strings >ICYA_JAKSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLDYCMENSAEPEQSLACQCLVRTPEVDDEALE KFDKALKALPMHIRLSFNPTQLEEQCHI more similar String kernels less similar

Diffusion kernel: establishes similarities between vertices of a graph, based on the connectivity information –based upon a random walk –efficiently accounts for all paths connecting two vertices, weighted by path lengths Data points: vertices Information: connectivity described by graph Kernel design: graph

Kernel methods with heterogeneous data 1 K 2 ?

Learning the kernel matrix K ?? Any symmetric positive definite matrix specifies a kernel matrix Define cost function to assess the quality of a kernel matrix Positive semidefinite matrices form a convex cone Restrict to convex cost functions Learn K from the convex cone of positive-semidefinite matrices… … according to a convex quality measure

Learning the kernel matrix K ? Learn K from the convex cone of positive-semidefinite matrices… ? … according to a convex quality measure Semidefinite Programming (SDP) : deals with optimizing convex cost functions over the convex cone of positive semidefinite matrices (or a convex subset of it)

K Integrate constructed kernels Large margin classifier (SVM) ? Learn K from the convex cone of positive-semidefinite matrices (or a convex subset) … ? … according to a convex quality measure Classification with multiple kernels

K Integrate constructed kernels learn a linear combination Large margin classifier (SVM) ? Learn K from the convex cone of positive-semidefinite matrices (or a convex subset) … ? … according to a convex quality measure Classification with multiple kernels

K Large margin classifier (SVM) maximize the margin ? Learn K from the convex cone of positive-semidefinite matrices (or a convex subset) … ? … according to a convex quality measure Classification with multiple kernels Integrate constructed kernels learn a linear combination

SVM, one kernel, dual formulation SVM, multiple kernels, dual formulation Classification with multiple kernels Convex (pointwise max of set of convex functions) Semidefinite programming problem

SVM, one kernel, dual formulation SVM, multiple kernels, dual formulation Classification with multiple kernels Need to reformulate this in standard SDP format

Integrate constructed kernels learn a linear mix Large margin classifier (SVM) maximize the margin SDP (standard form) Classification with multiple kernels

Integrate constructed kernels learn a linear mix Large margin classifier (SVM) maximize the margin Theoretical performance guarantees Classification with multiple kernels

Yeast membrane protein prediction Yeast protein function prediction Applications in computational biology

Yeast Membrane Protein Prediction Membrane proteins: –anchor in various cellular membranes –serve important communicative functions across the membrane –important drug targets About 30% of the proteins are membrane proteins

Protein sequences: SW scores Protein sequences: BLAST scores E-values of Pfam domains Protein-protein interactions mRNA expression profiles Hydropathy profile Yeast Membrane Protein Prediction Diffusion Gaussian

Protein sequences: SW scores Protein sequences: BLAST scores E-values of Pfam domains Protein-protein interactions mRNA expression profiles Hydropathy profile Yeast Membrane Protein Prediction K

Five different types of data: –Pfam domains –genetic interactions (CYGD) –physical interactions (CYGD) –protein-protein interaction (TAP) –mRNA expression profiles Compare our approach to approach using Markov Random Fields (Deng et al.) –using the five types of data –also reporting improved accuracy compared to using any single data type Yeast Protein Function Prediction

MRF SDP/SVM (binary) SDP/SVM (enriched) Yeast Protein Function Prediction

Conclusion Computational and statistical framework to integrate data from heterogeneous information sources –flexible and unified approach –within kernel methodology –specifically: classification problems –resulting formulation: semidefinite programming Applications show classification performance can be enhanced by integrating diverse genome-wide information sources