1 Graph Mining Applications to Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda.

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Biointelligence Laboratory, Seoul National University
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Support Vector Machines
Machine learning continued Image source:
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Expectation Maximization
Supervised Learning Recap
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Protein Homology Detection Using String Alignment Kernels Jean-Phillippe Vert, Tatsuya Akutsu.
Unsupervised Learning
Visual Recognition Tutorial
Scalable Text Mining with Sparse Generative Models
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Graph Classification.
Classification and Prediction: Regression Analysis
Radial Basis Function Networks
An Introduction to Support Vector Machines Martin Law.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
EM and expected complete log-likelihood Mixture of Experts
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
An Introduction to Support Vector Machines (M. Law)
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
The minimum cost flow problem. Solving the minimum cost flow problem.
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
The minimum cost flow problem
Machine Learning Basics
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Probabilistic Models with Latent Variables
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

1 Graph Mining Applications to Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda

2 Graphs …

3 DNA Sequence RNA Texts in literature Graph Structures in Biology C C O C C C C H ACGC Amitriptyline inhibitsadenosineuptake H H H H H Compounds CG UUUU UA

4 Substructure Representation 0/1 vector of pattern indicators Huge dimensionality! Need Graph Mining for selecting features Better than paths (Marginalized graph kernels) patterns

5 Overview Quick Review on Graph Mining EM-based Clustering algorithm Mixture model with L1 feature selection Graph Boosting Supervised Regression for QSAR Analysis Linear programming meets graph mining

6 Quick Review of Graph Mining

7 Graph Mining Analysis of Graph Databases Find all patterns satisfying predetermined conditions Frequent Substructure Mining Combinatorial, Exhaustive Recently developed AGM (Inokuchi et al., 2000), gspan (Yan et al., 2002), Gaston (2004)

8 Graph Mining Frequent Substructure Mining Enumerate all patterns occurred in at least m graphs :Indicator of pattern k in graph i Support(k): # of occurrence of pattern k

9 Gspan (Yan and Han, 2002) Efficient Frequent Substructure Mining Method DFS Code Efficient detection of isomorphic patterns Extend Gspan for our works

10 Enumeration on Tree-shaped Search Space Each node has a pattern Generate nodes from the root: Add an edge at each step

11 Tree Pruning Anti-monotonicity: If support(g) < m, stop exploring! Not generated Support(g): # of occurrence of pattern g

12 Discriminative patterns: Weighted Substructure Mining w_i > 0: positive class w_i < 0: negative class Weighted Substructure Mining Patterns with large frequency difference Not Anti-Monotonic: Use a bound

13 Multiclass version Multiple weight vectors (graph belongs to class ) (otherwise) Search patterns overrepresented in a class

14 EM-based clustering of graphs Tsuda, K. and T. Kudo: Clustering Graphs by Weighted Substructure Mining. ICML 2006, , 2006

15 EM-based graph clustering Motivation Learning a mixture model in the feature space of patterns Basis for more complex probabilistic inference L1 regularization & Graph Mining E-step -> Mining -> M-step

16 Probabilistic Model Binomial Mixture Each Component :Mixing weight for cluster :Feature vector of a graph (0 or 1) :Parameter vector for cluster

17 Function to minimize L1-Regularized log likelihood Baseline constant ML parameter estimate using single binomial distribution In solution, most parameters exactly equal to constants

18 E-step Active pattern E-step computed only with active patterns (computable!)

19 M-step Putative cluster assignment by E-step Each parameter is solved separately Use graph mining to find active patterns Then, solve it only for active patterns

20 Solution Occurrence probability in a cluster Overall occurrence probability

21 Important Observation For active pattern k, the occurrence probability in a graph cluster is significantly different from the average

22 Mining for Active Patterns F F is rewritten in the following form Active patterns can be found by graph mining! (multiclass)

23 Experiments: RNA graphs Stem as a node Secondary structure by RNAfold 0/1 Vertex label (self loop or not)

24 Clustering RNA graphs Three Rfam families Intron GP I (Int, 30 graphs) SSU rRNA 5 (SSU, 50 graphs) RNase bact a (RNase, 50 graphs) Three bipartition problems Results evaluated by ROC scores (Area under the ROC curve)

25 Examples of RNA Graphs

26 ROC Scores

27 No of Patterns & Time

28 Found Patterns

29 Summary (EM) Probabilistic clustering based on substructure representation Inference helped by graph mining Many possible extensions Na ï ve Bayes Graph PCA, LFD, CCA Semi-supervised learning Applications in Biology?

30 Graph Boosting Saigo, H., T. Kadowaki and K. Tsuda: A Linear Programming Approach for Molecular QSAR analysis. International Workshop on Mining and Learning with Graphs, 85-96, 2006

31 Graph Regression Problem Known as QSAR problem in chemical informatics Quantitative Structure-Activity Analysis Given a graph, predict a real-value Typically, features (descriptors) are given

32 QSAR with conventional descriptors #atoms#bonds#rings…Activity

33 Motivation of Graph Boosting Descriptors are not always available New features by obtaining informative patterns (i.e., subgraphs) Greedy pattern discovery by Boosting + gSpan Linear Programming (LP) Boosting for reducing the number of graph mining calls Accurate prediction & interpretable results

34 Molecule as a labeled graph C C C C C C O C CC C

35 QSAR with patterns …Activity C C C C C C C C C C C C C C C C O Cl C C C C C C C C C C C C C C C C C O C

36 Sparse regression in a very high dimensional space G: all possible patterns (intractably large) |G|-dimensional feature vector x for a molecule Linear Regression Use L1 regularizer to have sparse α Select a tractable number of patterns

37 Problem formulation We introduce ε-insensitive loss and L1 regularizer m: # of training graphs d = |G| ξ +, ξ - : slack variables ε: parameter

38 Dual LP Primal: Huge number of weight variables Dual: Huge number of constraints LP1-Dual

39 Column Generation Algorithm for LP Boost (Demiriz et al., 2002) Start from the dual with no constraints Add the most violated constraint each time Guaranteed to converge Constraint Matrix Used Part

40 Finding the most violated constraint Constraint for a pattern (shown again) Finding the most violated one Searched by weighted substructure mining

41 Algorithm Overview Iteration Find a new pattern by graph mining with weight u If all constraints are satisfied, break Add a new constraint Update u by LP1-Dual Return Convert dual solution to obtain primal solution α

42 Speed-up by adding multiple patterns (multiple pricing) So far, the most violated pattern is chosen Mining and inclusion of top k patterns at each iteration Reduction of the number of mining calls A Linear Programming Approach for Molecular QSAR Analysis

43 Speed-up by multiple pricing

44 Clearly negative data #atoms#bonds#rings…Activity A Linear Programming Approach for Molecular QSAR Analysis

45 Inclusion of clearly negative data LP2-Primal l: # of clearly negative data z: predetermined upperbound ξ ’ : slack variable

46 Experiments Data from Endocrine Disruptors Knowledge Base 59 compounds labeled by real number and 61 compounds labeled by a large negative number Label (target) is a log translated relative proliferative potency (log(RPP)) normalized between – 1 and 1 Comparison with Marginalized Graph Kernel + ridge regression Marginalized Graph Kernel + kNN regression

47 Results with or without clearly negative data LP2 LP1

48 Extracted patterns Interpretable compared with implicitly expressed features by Marginalized Graph Kernel

49 Summary (Graph Boosting) Graph Boosting simultaneously generate patterns and learn their weights Finite convergence by column generation Potentially interpretable by chemists. Flexible constraints and speed-up by LP.

50 Concluding Remarks Using graph mining as a part of machine learning algorithms Weights are essential Please include weights when you implement your item-set/tree/graph mining algorithms Make it available on the web! Then ML researchers can use it