Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mattew Mazowita, Lani Haque, and David Sankoff

Similar presentations


Presentation on theme: "Mattew Mazowita, Lani Haque, and David Sankoff"— Presentation transcript:

1 Mattew Mazowita, Lani Haque, and David Sankoff
Stability of Rearrangement measures in the comparison of genome sequences Presented by: Charlotte Wagner Searle Mattew Mazowita, Lani Haque, and David Sankoff

2 What are they trying to do?
Present data-analytic and statistical tools for: Studying rates of rearrangement of whole genomes To assess the stability of these methods with changes in the level of resolution of the genomic data Building on the ideas of Sankoff et al. (1997, 2000, 2005): Derive an estimator for the number of reciprocal translocations Use simulations to show that the bias and standard deviation of the estimator is less than 5% 2 models of random translocation, with and without conservation of the centromere

3 How? Construct datasets: Fit the data to an evolutionary tree:
Containing the number of conserved syntenies and conserved segments shared by pairs of animal genomes At different levels of resolution (30kb, 100kb, 300kb, and 1 Mb) Fit the data to an evolutionary tree: Find the rates of rearrangement

4 Conserved syntenies vs. conserved segments
Pairs of chromosomes, one from each species, containing at least one sufficiently long stretch of homologous sequence The total number of such stretches of homologous sequences ; regions of chromosomes in two related species in which both gene content and gene order are parallel in the two species (Sankoff, Ferreti and Nadeau, 1997)

5 1. Datasets Secondary data from UCSC Genome Browser
Human, Mouse, Chimp, Rat, Dog, and Chicken Different levels of resolution: 30kb, 100kb, 300kb, and 1Mb “The key to using whole genomes sequences to study evolutionary rearrangements is being able to partition each genome into segments conserved by two genomes since their divergence.”

6

7

8 2. Models of translocation
Model the autosomes of a genome as c linear segments with lengths p(1),…,p(c) Assume two breakpoints of a translocation are chosen independently Do not consider chromosome fusion or fission REMINDER: reciprocal translocation between two chromosomes consists of breaking each one at an interior point, creating two segments, and rejoining the four resulting segments such that two new chromosomes are produced

9 What are the models? Left-right orientation on each chromosome
Conservation of the centromere Estimator derived from this version Simulate to test the estimator Inverted left-hand fragment may rejoin another left-hand fragment High level of neocentromeric activity Simulate to test the estimator

10 3. Prediction and estimation
Assume process is reversible Equilibrium state of the process well approximates the distribution of human genomes When comparing two genomes, do not need to consider them as diverging independently from a common ancestor

11 Equations… i j

12

13 4. Estimator for genomes with different number of chromosomes
Human, mouse, chimp, rat, dog and chicken have different # of chromosomes Solution? Not good… therefore, need to construct an estimator that takes into account the # of chromosomes in both the genomes

14 1) Process-based estimator
Derived from equation (7)

15 2) State-base estimator
Based on the expectation over all chromosomes in A Remember: in Table 2, t^ is calculated according to equation (9)

16 5. Simulations Equilibrium distribution of chromosome size
Sankoff and Ferretti (1996)Models of accumulated reciprocal translocations for explaining the observed range of chromosome sizes in genome data Proposed lower threshold on chromosome size A cap on largest chromosome size (Schubert and Oud, 1997) and is effective (De et al., 2001). Simulate the translocation process 100 times up to 10, 000 translocations each to produce Figure 1

17 ii. Performance of the estimators
Process-based estimator “State-based” estimator

18 6. Fitting the data to animal phylogeny
Assumed the phylogenetic tree in Figure 4 to infer the rates of rearrangement on evolutionary lineages Fit the data in Table 2 to the tree

19

20

21

22

23 7. Observations

24 7. Discussion Proposed an estimator
Estimator is very accurate in simulations Applied estimator to animal genome comparisons at various levels of resolution Translocation estimate= stable; inversion estimate= increases At detailed levels of resolution, translocation # probably refers to other processes as well Increased inversion # are likely to reflect inversion process

25 THANK YOU!


Download ppt "Mattew Mazowita, Lani Haque, and David Sankoff"

Similar presentations


Ads by Google