Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline Cancer Progression Models

Similar presentations


Presentation on theme: "Outline Cancer Progression Models"— Presentation transcript:

1 Outline Cancer Progression Models
SNPs, Haplotypes, and Population Genetics: Introduction

2 Cancer: Mutation and Selection
Clonal theory of cancer: Nowell (Science 1976)

3 Cancer Genomes Leukemia Breast

4 “Comparative Genomics” of Cancer
Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 4 Tumor genome 3 Identify recurrent aberrations Mitelman Database, >40,000 aberrations Reconstruct temporal sequence of aberrations Linear model: Colorectal cancer (Vogelstein, 1988): -5q  12p*  -17p  -18q Tree model: (Desper et al.1999) 3) Find age of tumor, time of clonal expansion

5 Observing Cancer Progression
Obtaining longitudinal (time-course) data difficult. t1 t2 t3 t4 Latitudinal data (multiple patients) readily available. Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 4 Tumor genome 3

6 Multiple Mutations 4 step model for colorectal cancer, Vogelstein, et al. (1988) New Eng. J.Med -5q  12p*  -17p  -18q Inferred from latitudinal data in 172 tumor samples.

7 Oncogenetic Tree models (Desper et al. JCB 1999, 2001)
Given: measurements of chromosome gain/loss events in multiple tumor samples (CGH) Compute: rooted tree that best explains temporal sequence of events. {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q}

8 Oncogenetic Tree models (Desper et al. JCB 1999, 2000)
Given: measurements of chromosome gain/loss events in multiple tumor samples {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} L = set of chromosome alterations observed in all samples Tumor samples give probability distribution on 2L

9 Oncogenetic Tree T = (V, E, r, p, L) rooted tree V = vertices
E = edges L = set of events (leaves) r root p: E  (0,1] probability distribution T gives probability distribution on 2L e1 e2 e3 e4 e0

10 Results CGH of 117 cases of kidney cancer

11 Extensions Oncogenetic trees based on branching (Desper et al., JCB 1999)

12 Extensions

13 Extensions Oncogenetic trees based on branching (Desper et al., JCB 1999) Maximum Likelihood Estimation (von Heydebreck et al, 2004) Mutagenic trees: mixtures of trees (Beerenwinkel, et al. JCB 2005)

14 Heterogeneity within a tumor
Final tumor is clonal expansion of single cell lineage. Can we date the time of clonal expansion? Tsao, … Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000.

15 Estimating time of clonal expansion
Microsatellite loci (MS), CA dinucleotides. In tumors with loss of mismatch repair (e.g. colorectal), MS change size.

16 Estimating time of clonal expansion
For each MS locus, measure mean mi and variance si of size. S2allele = average of s12, …, sL2 S2loci = variance of m1, …, mL

17 Time to clonal expansion?

18 Simulation Estimates of Tumor Age
Y2 Y1 Y1 = time to clonal expansion Tumor age = Y1 + Y2 Branching process simulation. Each cell in population gives birth to 0, 1 or 2 daughter cells with +- 1 change in MS size (coalescent: forward, backward, forward simulation) Posterior estimate of Y1, Y2 by running simulations, accepting runs with simulated values of S2allele, S2loci close to observed.

19 Results 15 patients, 25 MS loci
Estimate time since clonal expansion from observed S2allele, S2loci .

20 Cancer: Mutation and Selection
Clonal theory of cancer: Nowell (Science 1976)

21 Population Genetics C.C. Maley: selective sweeps of mutations in tumor cell populations Chin and Gray: solid tumors

22 Genetics 101 Humans are diploid: two copies of each chromosome, maternal and paternal Locus: Region on a chromosome (gene, nucleotide, etc.) Allele: “Value” at a locus Genotype: Pair of alleles (maternal and paternal) at loci on a chromosome (homozygous, heterozygous) Haplotype: Alleles of loci on same chromosome (maternal or paternal)

23 Allele Measurement “Old days” (< 1970?): gene variants
More recently: (1980’s-90’s), various sequence based genetic markers: microsatellites, sequence tagged sites (STS), etc. Today: single nucelotide polymorphisms (SNPs)

24 Single Nucleotide Polymorphisms
Infinite Sites Assumption: Each site mutates at most once By convention, SNPs are biallelic: only two of four possible nucleotides present in population

25 Infinite Sites Assumption
B 3 8 5 The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. Each sequence has single parent. The history of a population can be expressed as a tree. The tree can be constructed efficiently

26 Infinite sites Assumption and Perfect Phylogeny
Each site is mutated at most once in the history. All descendants must carry the mutated value, and all others must carry the ancestral value i 1 in position i 0 in position i

27 Perfect Phylogeny Assume an evolutionary model in which only mutation takes place, The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny. How can one reconstruct such a tree?

28 The 4-gamete condition A column i partitions the set of species into two sets i0, and i1 A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous. EX: i is heterogenous w.r.t {A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1

29 4 Gamete Condition 4 Gamete Condition
There exists a perfect phylogeny if and only if for all pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1. Equivalent to There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist (0,0), (0,1), (1,0), (1,1)

30 4-gamete condition: proof
Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous. (only if) Every perfect phylogeny satisfies the 4-gamete condition (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist? i0 i1 i

31 An algorithm for constructing a perfect phylogeny
We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. In any tree, each node (except the root) has a single parent. It is sufficient to construct a parent for every node. In each step, we add a column and refine some of the nodes containing multiple children. Stop if all columns have been considered.

32 Inclusion Property For any pair of columns i,j
i < j if and only if i1  j1 Note that if i<j then the edge containing i is an ancestor of the edge containing j i j

33 Example r A B C D E A B C D E Initially, there is a single clade r, and each node has r as its parent

34 Sort columns Sort columns according to the inclusion property (note that the columns are already sorted here). This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order A B C D E

35 Add first column In adding column i 1 2 3 4 5
B C D E In adding column i Check each edge and decide which side you belong. Finally add a node if you can resolve a clade r u B D A C E

36 Adding other columns A B C D E Add other columns on edges using the ordering property r 1 3 E 2 B 5 4 D A C

37 Unrooted case Switch the values in each column, so that 0 is the majority element. Apply the algorithm for the rooted case

38 Summary :No recombination leads to correlation between sites
3 8 5 The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. The history of a population can be expressed as a tree. The tree can be constructed efficiently

39 Haplotype Phasing Problem
Most sequencing technologies measure genotypes not haplotypes Pair of haplotypes Genotype: 2 = heterozygous Given a set of genotypes, infer the haplotypes. Use parsimony assumption Haplotypes satisfy perfect phylogeny (Gusfield) Find minimum number of haplotypes that explain observed genotypes

40 Recombination

41 Recombination A tree is not sufficient as a sequence may have 2 parents Recombination leads to violation of 4 gamete property. Recombination leads to loss of correlation between columns

42 Studying recombination
A tree is not sufficient as a sequence may have 2 parents Recombination leads to loss of correlation between columns How can we measure recombination?

43 Linkage (Dis)-equilibrium (LD)
A B 0 0 0 1 1 1 1 0 A B 0 1 0 0 1 0 No recombination Pr[A,B=0,1] = 0.25 Linkage disequilibrium Extensive Recombination Pr[A,B=(0,1)=0.125 Linkage equilibrium


Download ppt "Outline Cancer Progression Models"

Similar presentations


Ads by Google