Simultaneous Identification of Multiple Driver Pathways in Cancer Mark D. M. Leiserson, et.al.

Slides:



Advertisements
Similar presentations
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Advertisements

Example 1 Matrix Solution of Linear Systems Chapter 7.2 Use matrix row operations to solve the system of equations  2009 PBLPathways.
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Greedy Algorithms Basic idea Connection to dynamic programming Proof Techniques.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
Reliable System Design 2011 by: Amir M. Rahmani
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Linear System of Equations MGT 4850 Spring 2008 University of Lethbridge.
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
Achieving Minimum Coverage Breach under Bandwidth Constraints in Wireless Sensor Networks Maggie X. Cheng, Lu Ruan and Weili Wu Dept. of Comput. Sci, Missouri.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Bulut, Singh # Selecting the Right Interestingness Measure for Association Patterns Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava Department of Computer.
INDR 262 INTRODUCTION TO OPTIMIZATION METHODS LINEAR ALGEBRA INDR 262 Metin Türkay 1.
Linear Programming Models: Graphical and Computer Methods
1 Lecture 4 Maximal Flow Problems Set Covering Problems.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Bi-Clustering Jinze Liu. Outline The Curse of Dimensionality Co-Clustering  Partition-based hard clustering Subspace-Clustering  Pattern-based 2.
Square n-by-n Matrix.
Notation OR Backgrounds OR Convex sets Nonconvex set.
Matrices. Definitions  A matrix is an m x n array of scalars, arranged conceptually as m rows and n columns.  m is referred to as the row dimension.
A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.
Multiple Testing in Microarray Data Analysis Mi-Ok Kim.
Systems of Equations and Inequalities Systems of Linear Equations: Substitution and Elimination Matrices Determinants Systems of Non-linear Equations Systems.
Solving Linear Systems of Equations
5.5 Row Space, Column Space, and Nullspace
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science
Solve a system of linear equations By reducing a matrix Pamela Leutwyler.
Updated 21 April2008 Linear Programs with Totally Unimodular Matrices.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Biclustering of Expression Data by Yizong Cheng and Geoge M. Church Presented by Bojun Yan March 25, 2004.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
1.7 Linear Independence. in R n is said to be linearly independent if has only the trivial solution. in R n is said to be linearly dependent if there.
CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer Max Leiserson *, Hsin-Ta Wu *, Fabio Vandin, Benjamin.
The Canonical Form and Null Spaces Lecture III. The Canonical Form ä A canonical form is a solution to an underidentified system of equations. ä For example.
Efficient Point Coverage in Wireless Sensor Networks Jie Wang and Ning Zhong Department of Computer Science University of Massachusetts Journal of Combinatorial.
Optimization Models for Fixed Channel Assignment in Wireless Mesh Networks with Multiple Radios Arindam K. Das, Sumit Roy, SECON Kim Young.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Simplex Method Review. Canonical Form A is m x n Theorem 7.5: If an LP has an optimal solution, then at least one such solution exists at a basic feasible.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Notation OR Backgrounds OR Convex sets Nonconvex set.
CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )
Principal Components Analysis ( PCA)
Operations Research.  Operations Research (OR) aims to having the optimization solution for some administrative problems, such as transportation, decision-making,
Linear Programming (LP) Vector Form Maximize: cx Subject to : Ax  b c = (c 1, c 2, …, c n ) x = b = A = Summation Form Maximize:  c i x i Subject to:
Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.
Ch 4. Language Acquisition: Memoryless Learning 4.1 ~ 4.3 The Computational Nature of Language Learning and Evolution Partha Niyogi 2004 Summarized by.
Gaussian Elimination and Gauss-Jordan Elimination
Missing data: Why you should care about it and what to do about it
Independent Cascade Model and Linear Threshold Model
Chapter 7 Linear Programming
Kabanikhin S. I., Krivorotko O. I.
Advanced Design and Analysis Techniques
GCD and Optimization Problem
Distributed Submodular Maximization in Massive Datasets
Independent Cascade Model and Linear Threshold Model
Bin Fu Department of Computer Science
Linear Programming Objectives: Set up a Linear Programming Problem
Linear Equations in Linear Algebra
Chapter 1: Linear Equations in Linear Algebra
CS 485G: Special Topics in Data Mining
Approximation Algorithms for the Selection of Robust Tag SNPs
Unfolding with system identification
Independent Cascade Model and Linear Threshold Model
Fig. 1 Steps in the PDE functional identification of nonlinear dynamics (PDE-FIND) algorithm, applied to infer the Navier-Stokes equations from data. Steps.
Presentation transcript:

Simultaneous Identification of Multiple Driver Pathways in Cancer Mark D. M. Leiserson, et.al

Goal To distinguish the functional driver mutations responsible for cancer development from the random passenger mutations that have no consequences for cancer.

Multi-Dendrix Dendrix – De novo Driver Exclusivity Important Assumption: 1) High Coverage- most patients have at least one mutation in the set, i.e, set of potential mutated genes of a particular pathway 2) High Exclusivity- nearly all patients have no more than one mutation in the set

Justification by the author From Vandin, et al, 2012

Dendrix - Method From Vandin, et al, 2012

Coverage Overlap Weight Denote the set of patients in which g is mutated  Denote the set of patients in which at least one of the genes in M is mutated  Dendrix Method Maximum Coverage Exclusive Submatrix Problem: Given an m*n mutation matrix A and an integer k>0, find a mutually exclusive m*k submatrix of M of k columns (genes) of A with the largest number of nonzero rows (patients). From Vandin, et al, 2012

Problem Computationally Difficult to Solve Size k = 6 of 20,000 genes 10^ 23 subsets Solution A greedy Algorithm for independent genes Markov Chain Monte Carlo (MCMC) Dendrix Method Maximum Weight Submatrix Problem: Given an m * n mutation matrix A and an integer k >0, find the m * k column submatrix M of A that maximizes W (M). From Vandin, et al, 2012

Limitation of Dendrix Mutations in different pathways may not be mutually exclusive. Mutations in different pathways may exhibit significant patterns of co-occurrence across patients. Solution -> Multi-Dendrix Algorithm

Multi-Dendrix Algorithm 1) Find sets of genes with high coverage as an integer linear program (ILP) 2) Generalize the ILP to simultaneously find multiple driver pathways 3) Additional Analysis: Subtype-specific mutations, stability measures, permutation test, compute enrichment states

The Multi-Dendrix Pipeline

Coverage Overlap Weight Denote the set of patients in which g is mutated  Denote the set of patients in which at least one of the genes in M is mutated  Multi-Dendrix Method - the same as the first step of Dendrix Maximum Coverage Exclusive Submatrix Problem: Given an m*n mutation matrix A and an integer k>0, find a mutually exclusive m*k submatrix of M of k columns (genes) of A with the largest number of nonzero rows (patients).

ILP- Integer Linear Programming Mathematical optimization or feasibility program where variables are restricted to be integers From Wikipedia

ILP for the Maximum Weight Submatrix Problem For each patient I, the coverage is determined by For each gene j, a gene set M is determined by Mutation matrix:

 Denote the set of patients in which g is mutated Denote the set of patients in which at least one of the genes in M is mutated  

Multiple Maximum Weight Submatrices Problem Multiple Maximum Weight Submatrices Problem: Given an m*n mutation matrix A and an integer t>0, find a collection M = { M1, M2, …., Mt} of m*k column submatrices that maximizes

Multi-Dendrix Results on the GBM Dataset

Multi-Dendrix Results on the BRCA Dataset

Thank you for your attention!