Habil Zare Department of Genome Sciences University of Washington

Slides:



Advertisements
Similar presentations
Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Advertisements

Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Association Tests for Rare Variants Using Sequence Data
Hypothesis testing –Revisited A method for deciding whether the sample that you are looking at has been changed by some type of treatment (Independent.
Immune profiling with high-throughput sequencing Harlan Robins 1,2 Cindy Desmarais 2, Chris Carlson 1,2 Fred Hutchinson Cancer Research Center, Seattle,
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
TEMPLATE DESIGN © Distribution of Passenger Mutations in Exponentially Growing Wave 0 Cancer Population Yifei Chen 1 ;
What is the probability that of 10 newborn babies at least 7 are boys? p(girl) = p(boy) = 0.5 Lecture 10 Important statistical distributions Bernoulli.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Population Genetics.
Chapter 4 Probability Distributions
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Quantitative Genetics
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Chapter 8 Introduction to Hypothesis Testing
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Random Sampling, Point Estimation and Maximum Likelihood.
Case Study - Relative Risk and Odds Ratio John Snow’s Cholera Investigations.
Decisions from Data: The Role of Simulation Gail Burrill Gail Burrill Michigan State University
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Chapter 16 The Chi-Square Statistic
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Comp. Genomics Recitation 3 The statistics of database searching.
PACKET #59 CHAPTER #23 Microevolution 10/31/2015 4:20 PM 1.
Copyright  2003 by Dr. Gallimore, Wright State University Department of Biomedical, Industrial Engineering & Human Factors Engineering Human Factors Research.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Computational Identification of Tumor heterogeneity
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
BHS Methods in Behavioral Sciences I May 9, 2003 Chapter 6 and 7 (Ray) Control: The Keystone of the Experimental Method.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Dan Frumkin*, Adam Wasserstrom*, Shai Kaplan, Uriel Feige & Ehud Shapiro * equal contributors Genomic variability within an organism exposes its cell lineage.
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13: (2008) Presenter: Yong Li.
Cell Lineage Analysis of a Mouse Tumor
Cancer Vaccine Design Ion Mandoiu
Introduction to Hypothesis Test – Part 2
Dr. Habil Zare, PhD Oncinfo Lab Texas State University 4 Dec 2014
Increasingly detailed genomic data from successive generations of DNA sequencing technology reveal the clonal structure of cancer cell populations. Early.
Active Learning Lecture Slides
CONCEPTS OF HYPOTHESIS TESTING
Ranking Tumor Phylogeny Trees by Likelihood
Mohammed El-Kebir, Gryte Satas, Layla Oesper, Benjamin J. Raphael 
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Optimizing Cancer Genome Sequencing and Analysis
Sampling Distribution of a Sample Proportion
Multiregional Tumor Trees Are Not Phylogenies
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Sampling Distribution of a Sample Proportion
Utilizing NGS-Data to Evaluate Anti-PD-1 Treatment
High-Definition Reconstruction of Clonal Composition in Cancer
Tianjian Zhou U of Chicago/UT Austin
Presentation transcript:

Inferring clonal composition of a breast cancer from multiple tissue samples Habil Zare Department of Genome Sciences University of Washington 19 Dec 2013

Hypothesis Because cancer is a heterogeneous disease, synergistic medications can treat it better than a single drug.

Traditional concept of a tumor Schematic figure

Most tumors are heterogeneous Schematic figure

Different clones have different genotypes and phenotypes Schematic figure

It is important to identify the clonal composition Treatment A Relapse Treatment B Relapse

It is important to identify the clonal composition ?

It is important to identify the clonal composition ?

Our approach to analyze multiple samples from a single tumor

Our approach to analyze multiple samples from a single tumor

Each sample has different information about the clonal composition Counting the number of reads which support each mutation PCR Next Gen Sequencing PCR Next Gen Sequencing PCR Next Gen Sequencing

A closer look at the Next-Gen Sequencing output How to use this variation to infer the clonal composition? At each locus, 2 integers are provided: total number of analyzed reads, and the number of reads supporting the mutation. Because different clones have different contributions to each sample, these numbers vary across the samples.

The observations Tumor Building a generative model The observations Tumor The observations boils down to the number of reads which support each allele. M samples Mutations on N loci

Building a generative model Given the parameters, how to generate data?

Building a generative model Given the parameters, how to generate data?

Building a generative model Generate ? Parameters

The main assumption on the distribution of reads Building a generative model The main assumption on the distribution of reads Mutation i can be present or absent in each clone Assumption: Reads are analyzed uniformly at random => Binomially distributed Project on Mutation i

The main assumption on the distribution of reads Building a generative model Mutation i can be present or absent in each clone Number of reads exhibiting the variant allele at locus i in sample j. Project on Mutation i Assumption Total number of reads Frequency of variant allele

A close look at the binomial distribution Building a generative model A close look at the binomial distribution Number of reads exhibiting the variant Observed Total number of reads Observed ? Frequency of variant allele depends on: Which clones contain mutation i ? What is the frequency of those clones in sample j ?

Introducing the hidden variables Building a generative model If Zi,c = 1, clone c has a variant allele at locus i. depends on: Which clones contain mutation i ? What is the frequency of those clones in sample j ?

Notation for the model parameters Building a generative model depends on: Which clones contain mutation i ? What is the frequency of those clones in sample j ?

Building a generative model Generate ? C Parameters

The assumptions Building a generative model Each mutation can occur at a locus independently at random. The samples are independent from each other.

Building a generative model Technical Generate C Parameters

Overview of the generative model from parameters to the observations C Observations

Inference Given the observed counts, how do we infer the clonal structure? Technical EM Inference C

We infer model parameters using expectation-maximization Derived from the binomial distribution Derived from Bernoulli distribution Details omitted

How can we evaluate whether the model works? Two rounds of next gen sequencing Inference C

We do not know the reality ~ Inferred Reality

Generating synthetic data Generate Inference ` C

Generating synthetic data Generate Random parameters compare Inference C

Defining accuracy criteria Genotype error: The frequency of false entries in the genotype matrix Z Clone frequency error: The average error in entries of the frequency matrix P

Simulation shows genotype error decreasing with increasing samples

Simulation shows genotype error decreasing with increasing samples

Clone frequency error shows a similar trend

Experiment with real data Study on a primary breast cancer 10 breast tumor samples 1 adjacent normal 2 samples from the metastatic lymph node Maybe move tis slide and the next one, with a plot of frequencies, to earlier.

Clone frequencies vary smoothly across the tumor sections The model doesn’t know anything about the anatomic location of the samples!

Clone frequencies vary smoothly across the tumor sections

Phylogenetic analysis tells the story of the tumor over time

Five clone solution

Six clone solution is consistent with five-clone solution

Anatomic variation of clones Validated by simulations Overview of the project Inferring clonal composition of a breast cancer from multiple tissue samples Oncologists Anatomic variation of clones Phylogenetic trees EM Validated by simulations Next-Gen Sequencing Data Clonal structure 42

Software publicly available 43

Supplementary slides

Leukemia or lymphoma sample Proposed project based on former experiences: Identifying clonal decomposition using sub-tissues SamSPECTRAL Leukemia or lymphoma sample Sort cell populations Next Gen Sequencing Clonal analysis 45