Wei Jiao, Shankar Vembu, Amit G Deshwar,

Slides:



Advertisements
Similar presentations
Supervisor: VS 高志平 Reporter: R4 張妙而.  Mutations in nucleophosmin 1 ( NPM1 ) gene, one of the most common gene mutations (25%-30%) in AML  NPM1 mut co-occurs.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Sampling distributions of alleles under models of neutral evolution.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
TEMPLATE DESIGN © Distribution of Passenger Mutations in Exponentially Growing Wave 0 Cancer Population Yifei Chen 1 ;
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
PhyloSub Jiao et. al. BMC Bioinformatics 2014, 15:35.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Habil Zare Department of Genome Sciences University of Washington
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Molecular phylogenetics
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
The Evolutionary History of Biodiversity
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Predicting the Onset of AIDS Robert Arnold, Alex Cardenas, Zeb Russo LMU Biology Department 10/5/2011.
Why are there so few key mutant clones? Why are there so few key mutant clones? The influence of stochastic selection and blocking on affinity maturation.
Computational Identification of Tumor heterogeneity
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
A comparison of somatic mutation callers in breast cancer samples and matched blood samples THOMAS BRETONNET BIOINFORMATICS AND COMPUTATIONAL BIOLOGY UNIT.
Cell Lineage Analysis of a Mouse Tumor
Introduction to Bioinformatics Resources for DNA Barcoding
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Timing, rates and spectra of human germline mutation
Phylogeny - based on whole genome data
Of Sea Urchins, Birds and Men
Distance based phylogenetics
17.2 Classification based on evolutionary relationships
Pipelines for Computational Analysis (Bioinformatics)
5.4 Cladistics.
Multiple Alignment and Phylogenetic Trees
Cladistics (Ch. 22) Based on phylogenetics – an inferred reconstruction of evolutionary history.
Ranking Tumor Phylogeny Trees by Likelihood
Volume 67, Issue 4, Pages (April 2015)
Clinical Implications of Clonal Hematopoiesis
Mohammed El-Kebir, Gryte Satas, Layla Oesper, Benjamin J. Raphael 
Clonal evolution in Ewing sarcoma.
Optimizing Cancer Genome Sequencing and Analysis
Multiregional Tumor Trees Are Not Phylogenies
Thomas Willems, Melissa Gymrek, G
Predicting the Onset of AIDS
Outline Cancer Progression Models
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Volume 10, Issue 5, Pages (May 2012)
Volume 3, Issue 2, Pages (August 2014)
Determination of complex subclonal structures of hematological malignancies by multiplexed genotyping of blood progenitor colonies  Francesca L. Nice,
Inferring Tumor Phylogenies from Multi-region Sequencing
High-Definition Reconstruction of Clonal Composition in Cancer
Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours By: Anh Pham.
Presentation transcript:

Inferring clonal evolution of tumors from single nucleotide somatic mutations Wei Jiao, Shankar Vembu, Amit G Deshwar, Lincoln Stein, and Quaid Morris

Problem Proposal Cancer is characterized by rapid cell division and mutation Leads to many heterogenous subclonal populations within a tumor Driver mutations are causal in adaptation and spread of cancerous cells Passenger mutations have no functional consequence Research interest in identifying driver and passenger mutations leads to interest in tracing mutational patterns of cancer tumors Deconvolution of cell mixture and construction of phylogeny Unable to observe taxa directly in tumor samples due to heterogeneity Need to deconvolve taxa from samples

Input and Assumptions Deep sequencing: Assumptions: Sequence particular regions of DNA for hundreds or thousands of times Allows detection of rare clonal types within a sample Observe frequency of mutations within each sample Assumptions: Clonal evolution model: all cells in the tumor are derived from ancestors and mutations that confer advantages will proliferate All tumor cells are derived from a single wild-type clone Infinite sites Copy-number of SNVs is given as input Assume that each SNV has the same copy-number Deep sequencing is necessary for cancer tumors due to heterogeneity Authors suggest doing further sequencing for subclonal lineage (whole genome) Image: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542783/ El-Kebir, M., Satas, G., Oesper, L., & Raphael, B. J. (2016). Inferring the Mutational History of a Tumor Using Multi-state Perfect Phylogeny Mixtures. Cell Systems,3(1), 43-53. doi:10.1016/j.cels.2016.07.004 What is a copy number variant, and why are they important risk factors for ASD? (n.d.). Retrieved April 04, 2018, from http://readingroom.mindspec.org/?page_id=8221

Approach Infinite sites assumption Topological constraints rules Each SNV (mutation) only appears once Different mutations do not occur in the same location Topological constraints rules Ancestor Condition: ancestor mutations must have equal or higher frequencies than their descendants Sum Condition: if a branching phylogeny exists, then the ancestor mutation must have a higher frequency than the sum of its descendants Crossing Rule: if the frequency of a mutation is not consistently greater than or equal to that of another, then it cannot be an ancestor Mudaliar, M. (2015, June 12). Variant (SNP) calling - an introduction (with a worked example, using... Retrieved April 04, 2018, from https://www.slideshare.net/drmani_vet/variant-calling-workshop-glasgow-20150609 Infinites sites is related to no-homopasy Image: https://www.slideshare.net/drmani_vet/variant-calling-workshop-glasgow-20150609

Algorithm Input: Process: Output: Read counts for each SNV in each sample Copy-number status for each SNV Process: Group SNVs into sub-lineages Output: “partial order plot” Represent posterior uncertainty in phylogeny Image: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-35

Results Simulation: Comparison to real data: Generate data without clear phylogeny Generate data with clear phylogeny Comparison to real data: Chronic lymphocytic leukemia data Acute myeloid leukemia data

Simulation Simulate SNV frequencies consistent with multiple phylogenies Process Set parameters: Number of nodes Height of tree Number of possible siblings per node ~9 SNVs Sample read counts from each node using binomial distribution Vary number of reads PhyloSub works well on high read counts Able to recover clusters (correlation > 0.99) At lower read depths, true clusters are merged When simulating chain phylogeny, Phylosub recovers true phylogeny [need images]

Chronic Lymphocytic Leukemia Compared predicted trees to trees constructed from whole genome sequencing Grouped SNVs into subclonal lineages using k-means clustering Similarities in allele frequencies Changes in allele frequencies over time Construct phylogenetic tree based on unknown method Phylosub tree matched original tree structure on 100% of patients but clusters varied No “ground truth” data available so comparison predictions may be incorrect http://www.bloodjournal.org/content/120/20/4191.long?sso-checked=true Schuh A, Becq J, Humphray S, Alexa A, Burns A, Clifford R, Feller SM, Grocock R, Henderson S, Khrebtukova I, Kingsbury Z, Luo S, McBride D, Murray L, Menju T, Timbs A, Ross M, Taylor J, Bentley D: Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood. 2012, 120 (20): 4191-4196. 10.1182/blood-2012-05-433540. Schuh et al. on left, Phylosub on right

Acute myeloid leukemia Single cells sequencing → low coverage Often only saw two or three snvs in a colony, but that doesn’t mean that there aren’t more Passenger mutations Phylosub Comparison Single cell sequencing confirms the existence of some hypothesized mutants, but others commonly present in sample are predicted to be rare by Phylosub Claim that error arises from biases in deep-sequencing and experimental error [fix/ add to this slide!!!] [need to talk about limitations of single cell sequencing]

Variant allele read counts AML: Tree Comparison SNV Variant allele read counts Read depth Allele frequency Cluster ID CACNA1H 12,085 24,860 0.486 (95% CI: 0.481-0.491) A TET2-T1884A 4,220 8,772 0.481 (95% CI: 0.472-0.490) B TET2-Y1649stop 7,792 16,211 0.481 (95% CI: 0.474-0.487) CXorf66 3,684 8,150 0.452 (95% CI: 0.443-0.461) CXorf36 3,523 8,060 0.437 (95% CI: 0.428-0.446) DOCK9 3,391 8,676 0.391 (95% CI: 0.382-0.400) C NCRNA00200 9,201 25,413 0.362 (95% CI: 0.357-0.367) CTCF 10,558 30,119 0.351 (95% CI: 0.346-0.355) GABARAPL1 1,648 4,992 0.330 (95% CI: 0.319-0341) SCN4B 5,113 16,386 0.312 (95% CI: 0.306-0.318) Cluster ID: given from original paper (left) Phylosub tree given on right Data given in table

Single-Cell Sequencing and PhyloSub

Summary Problem Approach Results Limitations Deconvolution of taxa from samples of heterogeneous tumor cell mixtures Approach Assume homoplasy-free mutation Use topological constraints to estimate ancestor-descendant relationships based on SNV frequency Derive evolutionary tree from likelihood of edges Results Able to recover some structures and some clusters Limitations Binary encoding Copy-number assumptions Scaling difficulty

References Jiao, W., Vembu, S., Deshwar, A. G., Stein, L., & Morris, Q. (2014). Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics,15(1), 35. doi:10.1186/1471-2105-15-35 Jan M, Snyder TM, Corces-Zimmerman MR, Vyas P, Weissman IL, Quake SR, Majeti R: Clonal evolution of preleukemic hematopoietic stem cells precedes human acute myeloid leukemia. Sci Transl Med. 2012, 4 (149): 149ra118- Schuh A, Becq J, Humphray S, Alexa A, Burns A, Clifford R, Feller SM, Grocock R, Henderson S, Khrebtukova I, Kingsbury Z, Luo S, McBride D, Murray L, Menju T, Timbs A, Ross M, Taylor J, Bentley D: Monitoring chronic lymphocytic leukemia progression by whole genome sequencing reveals heterogeneous clonal evolution patterns. Blood. 2012, 120 (20): 4191-4196. 10.1182/blood-2012-05-433540.

Variant allele read counts SNV Variant allele read counts Read depth Allele frequency Cluster ID CACNA1H 12,085 24,860 0.486 (95% CI: 0.481-0.491) A TET2-T1884A 4,220 8,772 0.481 (95% CI: 0.472-0.490) B TET2-Y1649stop 7,792 16,211 0.481 (95% CI: 0.474-0.487) CXorf66 3,684 8,150 0.452 (95% CI: 0.443-0.461) CXorf36 3,523 8,060 0.437 (95% CI: 0.428-0.446) DOCK9 3,391 8,676 0.391 (95% CI: 0.382-0.400) C NCRNA00200 9,201 25,413 0.362 (95% CI: 0.357-0.367) CTCF 10,558 30,119 0.351 (95% CI: 0.346-0.355) GABARAPL1 1,648 4,992 0.330 (95% CI: 0.319-0341) SCN4B 5,113 16,386 0.312 (95% CI: 0.306-0.318)