Mattew Mazowita, Lani Haque, and David Sankoff

Slides:



Advertisements
Similar presentations
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Advertisements

A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
R ATES OF P OINT M UTATION. The rate of mutation = the number of new sequence variants arising in a predefined target region per unit time. Target region.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
Chapter 20 Cladograms.
Molecular Evolution Revised 29/12/06
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
The Statistical Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Heuristic alignment algorithms and cost matrices
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Background About the Pufferfish: Fugu is a teleost fish belonging to the order Tetraodontiformes. Fugu rubripes, an eukaryota and vertebrate, more commonly.
1 Mitochondrial Genomic Rearrangements in Songbirds.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Vervet Monkey Genomics: Genome Canada and Génome Québec Physical Map Project J. Wasserscheid, G. Leveque, C. Nagy, C. Pinsonnault, and K. Dewar, McGill.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Identifying conserved segments in rearranged and divergent genomes Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling.
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Comparative genomics Haixu Tang School of Informatics.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Measures of Conserved Synteny Work was funded by the National Science Foundation’s Interdisciplinary Grants in the Mathematical Sciences All work is joint.
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Genome Rearrangement By Ghada Badr Part I.
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Additional file 1: Summary of E-painting results for human chromosomes The human chromosome coordinates of the breakpoint intervals are given to.
Comparative maps of potato, eggplant, pepper and Nicotiana with respect to the tomato genome Silvana Grandillo CNR-IGV, Portici (Naples), Italy March 4,
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Chapter 26: Phylogeny and the Tree of Life
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
Genetics and Evolutionary Biology
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Chromosome-level Mutation
Comparative Genomics.
Very important to know the difference between the trees!
In-Text Art, Ch. 16, p. 316 (1).
Tests for Gene Clustering
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Volume 21, Issue 3, Pages (October 2017)
Model of segmental duplication Acceptor regions of the genome acquire segments of genomic material that range from 1–200 kb from disparate regions.
Chapter 20 Phylogenetic Trees. Chapter 20 Phylogenetic Trees.
Volume 21, Issue 3, Pages (October 2017)
Volume 10, Issue 11, Pages (March 2015)
Volume 23, Issue 10, Pages (October 2016)
Volume 14, Issue 7, Pages (February 2016)
Jeffrey A. Fawcett, Hideki Innan  Trends in Genetics 
Complex evolutionary trajectories of sex chromosomes across bird taxa
Rearrangement Phylogeny of Genomes in Contig form
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Mattew Mazowita, Lani Haque, and David Sankoff Stability of Rearrangement measures in the comparison of genome sequences Presented by: Charlotte Wagner Searle Mattew Mazowita, Lani Haque, and David Sankoff

What are they trying to do? Present data-analytic and statistical tools for: Studying rates of rearrangement of whole genomes To assess the stability of these methods with changes in the level of resolution of the genomic data Building on the ideas of Sankoff et al. (1997, 2000, 2005): Derive an estimator for the number of reciprocal translocations Use simulations to show that the bias and standard deviation of the estimator is less than 5% 2 models of random translocation, with and without conservation of the centromere

How? Construct datasets: Fit the data to an evolutionary tree: Containing the number of conserved syntenies and conserved segments shared by pairs of animal genomes At different levels of resolution (30kb, 100kb, 300kb, and 1 Mb) Fit the data to an evolutionary tree: Find the rates of rearrangement

Conserved syntenies vs. conserved segments Pairs of chromosomes, one from each species, containing at least one sufficiently long stretch of homologous sequence The total number of such stretches of homologous sequences ; regions of chromosomes in two related species in which both gene content and gene order are parallel in the two species (Sankoff, Ferreti and Nadeau, 1997)

1. Datasets Secondary data from UCSC Genome Browser Human, Mouse, Chimp, Rat, Dog, and Chicken Different levels of resolution: 30kb, 100kb, 300kb, and 1Mb “The key to using whole genomes sequences to study evolutionary rearrangements is being able to partition each genome into segments conserved by two genomes since their divergence.”

2. Models of translocation Model the autosomes of a genome as c linear segments with lengths p(1),…,p(c) Assume two breakpoints of a translocation are chosen independently Do not consider chromosome fusion or fission REMINDER: reciprocal translocation between two chromosomes consists of breaking each one at an interior point, creating two segments, and rejoining the four resulting segments such that two new chromosomes are produced

What are the models? Left-right orientation on each chromosome Conservation of the centromere Estimator derived from this version Simulate to test the estimator Inverted left-hand fragment may rejoin another left-hand fragment High level of neocentromeric activity Simulate to test the estimator

3. Prediction and estimation Assume process is reversible Equilibrium state of the process well approximates the distribution of human genomes When comparing two genomes, do not need to consider them as diverging independently from a common ancestor

Equations… i j

4. Estimator for genomes with different number of chromosomes Human, mouse, chimp, rat, dog and chicken have different # of chromosomes Solution? Not good… therefore, need to construct an estimator that takes into account the # of chromosomes in both the genomes

1) Process-based estimator Derived from equation (7)

2) State-base estimator Based on the expectation over all chromosomes in A Remember: in Table 2, t^ is calculated according to equation (9)

5. Simulations Equilibrium distribution of chromosome size Sankoff and Ferretti (1996)Models of accumulated reciprocal translocations for explaining the observed range of chromosome sizes in genome data Proposed lower threshold on chromosome size A cap on largest chromosome size (Schubert and Oud, 1997) and is effective (De et al., 2001). Simulate the translocation process 100 times up to 10, 000 translocations each to produce Figure 1

ii. Performance of the estimators Process-based estimator “State-based” estimator

6. Fitting the data to animal phylogeny Assumed the phylogenetic tree in Figure 4 to infer the rates of rearrangement on evolutionary lineages Fit the data in Table 2 to the tree

7. Observations

7. Discussion Proposed an estimator Estimator is very accurate in simulations Applied estimator to animal genome comparisons at various levels of resolution Translocation estimate= stable; inversion estimate= increases At detailed levels of resolution, translocation # probably refers to other processes as well Increased inversion # are likely to reflect inversion process

THANK YOU!