Charalampos (Babis) E. Tsourakakis WABI 2013, France WABI '131.

Slides:



Advertisements
Similar presentations
Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
Network Morphospace Andrea Avena-Koenigsberger, Joaquin Goni Ricard Sole, Olaf Sporns Tung Hoang Spring 2015.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
CSE182-L17 Clustering Population Genetics: Basics.
Toward Optimal Network Fault Correction via End-to-End Inference Patrick P. C. Lee, Vishal Misra, Dan Rubenstein Distributed Network Analysis (DNA) Lab.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Evolutionary Algorithms BIOL/CMSC 361: Emergence Lecture 4/03/08.
Charalampos (Babis) E. Tsourakakis SODA th January ‘11 SODA '111.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Randomized Algorithms for Bayesian Hierarchical Clustering
Percolation Processes Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopPercolation Processes1.
Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.
Bioinformatics and Computational Biology
Phylogeny Ch. 7 & 8.
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Gene Family Size Distributions Brought to You By Your Neighorhood Durand Lab Narayanan Raghupathy Nan Song Rose Hoberman.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Class Discussion Syphilis tree papers Your phylogenetic tree papers.
WABI: Workshop on Algorithms in Bioinformatics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Bayesian inference Presented by Amir Hadadi
Recitation 5 2/4/09 ML in Phylogeny
BNFO 602 Phylogenetics Usman Roshan.
Outline Cancer Progression Models
Algorithms for Inferring the Tree of Life
Major Design Strategies
Presentation transcript:

Charalampos (Babis) E. Tsourakakis WABI 2013, France WABI '131

2 A A A AA A A A B B B C A B A A A A A A B B B B B B A B A A A A A B B B D D D A A A A A A E E E E E E E Copy numbers for a single gene A B C D E (2) (3) (4) (1) (0) t 1 < t 2 < t 3 < t 4 < t 5 A A A A A A A A A A A A A A A A A A A A A C C EE EE EE D DD D

 Inverse problem approach  High-throughput DNA sequencing data by Oesper, Mahmoody, Raphael (Genome Biology 2013)  SNP array data by Van Loo et al. (PNAS 2010), Carter et al. (Nature Biotechnology 2012) WABI '133

FISH data, direct assessment WABI '134 Gene 1 Gene

WABI '135 p53 CCND1 Multidimensional histogram on the positive integer cone, e.g., for 2 dimensions

WABI '136

 Breast and cervical cancer data publicly available from NIH ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees/data WABI '137

 Understanding tumor heterogeneity is a key step towards:  find first mutation events, hence identify new drugs and diagnostics  predict response to selective pressure, hence develop strategies to avoid drug resistance  identify tumors likely to progress, hence avoid over- and under-treatment. WABI '138

 Pennington, Smith, Shackney and Schwartz (J. of Bioinf. and Comp. B. 2007)  Two probes  Random walk where coordinate i is picked independently and with probabilities p i0,p i-1,p i1 is modified by {0,-1,+1} respectively.  Efficient heuristic to maximize a likelihood function over all possible trees and parameters. WABI '139

 Chowdhury, Shackney, Heselmeyer-Haddad, Ried, Schäffer, Schwartz (Best paper in ISMB’13). Among other contributions:  Methods which are able to handle large number of cells and probes.  Exponential-time exact algorithm and an efficient heuristic for optimizing their objective  New test statistics, tumor classification  Extensive experimental evaluation WABI '1310

WABI '1311 Copies of gene 1 Copies of Gene X X X  Chowdhury et al.:  Problem: Find tree (and possibly Steiner nodes) to minimize cost of connecting all input (terminal) vertices

 Probabilistic approach  We summarize the empirical distribution based on a model that captures complex dependencies among probes without over-fitting.  Allows us to assign weights on the edges of the positive integer di-grid which capture how likely a transition is.  And now, how do we derive a tumor phylogeny?.. WABI '1312

WABI '1313 Potential function

WABI '1314

WABI '1315

 A lot of related work and off-the-shelf software for learning the parameters  Based on Zhao, Rocha and Yu who provide a general framework that allows us to respect the ‘hierarchical’ property..  … Schmidt and Murphy provide efficient optimization algorithms for learning a hierarchical log-linear model WABI '1316

 We use the learned hierarchical log-linear model in two ways  The non-zero weights provide us insights into dependencies of factors  We use them to assign weights on the positive integer di-grid WABI '1317

WABI ' Copies of gene 1 Copies of Gene 2 Nicholas Metropolis Given a probability distribution π on a state space we can define a Markov Chain whose stationary distribution is π.

 Question: Can we use the wealth of inter-tumor phylogenetic methods to understand intra-tumor cancer heterogeneity? WABI '

 Motivated by this question:  We prove necessary and sufficient conditions for the reconstruction of oncogenetic trees, a popular method for inter-tumor cancer inference  We exploit these to preprocess a FISH dataset into an inter-tumor cancer dataset that respects specific biological characteristics of the evolutionary process WABI '1320

 Desper, Jiang, Kallioniemi, Moch, Papadimitriou, Schäffer  T(V,E,r) rooted branching  F={A1,..,A m } where A i is the set of vertices of a rooted sub-branching of T.  What are the properties that F should have in order to uniquely reconstruct T? ▪ Let T be consistent with F if it could give rise to F. WABI '1321

WABI '1322 r a b c d r a b c r a b Patient 1, A 1 ={ r,a,b,c} Patient 2, A 2 ={ r,a,b} Onco-tree Patient 3, A 3 ={ r,a,b,d} r a b d r a b c d r a b c d Also, consistent with {A 1, A 2, A 3 }

 Theorem  The necessary and sufficient conditions to reconstruct T from F are the following: ▪ x,y such that (x,y) is an edge, there exists a set in the family that contains x but not y. WABI '1323 necessity

▪ If x is not a descedant of y and vice versa then there exist two sets A i,A j such that ▪ x is in A i but not in A j ▪ y is in A j but not in A i WABI '1324 necessity

 It turns out that the necessary conditions are sufficient (constructive proof)  Allows us to force an oncogenetic tree to capture certain aspects of intratumor heterogeneity dynamics WABI '1325

 We evaluate our method on real FISH data where we show findings consistent with cancer literature  Here, we show results for a breast cancer dataset WABI '1326

 No ground truth, but  concurrent loss of cdh1 function and p53 inactivation play a key role in breast cancer evolution  subsequent changes in ccnd1, myc, znf217 according to our tree are consistent with oncogenetic literature WABI '1327

 There exists a lot of interest in understanding intra-tumor heterogeneity  Releasement of FISH data that assess it directly can promote this understanding  Concerning our work:  Better algorithms for fitting the model  Allow higher-order interactions but use additional penalty (e.g., AIC) WABI '1328

 … concerning our work  Other choices of inter-tumor methods  Tumor classification applications  Consensus FISH trees  Allow more mechanisms in copy number changes  Understand better the connection between our work and Chowdhury et al. WABI '1329

WABI '1330 Russell SchwartzAlejandro Schäffer NSF Grant CCF

WABI '1331

WABI '1332

WABI '1333

WABI '1334 Generated with code available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees