Identification of large-scale genomic rearrangements between closely related organisms Bob Mau 1,2, Aaron Darling 1,3, Fred Blattner 4,5, Nicole Perna.

Slides:



Advertisements
Similar presentations
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Advertisements

Lateral Transfer. Donating Genes Mutation often disrupts the function of a gene Gene transfer is a way to give new functions to the recipient cell Thus,
Mechanisms of Genetic Variation 1 16 Copyright © McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display.
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
CLEAN GENOME E. COLI – MULTIPLE DELETION STRAINS Gulpreet Kaur Microbial Biotechnology, Fall 2011.
Bacterial conjugation is the transfer of genetic material (conjugative plasmid) between bacteria through direct cell to cell contact, or through a bridge-like.
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Comparative genomics Joachim Bargsten February 2012.
SplitMEM: graphical pan-genome analysis with suffix skips Shoshana Marcus May 29, 2014.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
7 The Genetics of Bacteria and Their Viruses. 2 3 Plasmids Many DNA sequences in bacteria are mobile and can be transferred between individuals and among.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Bioinformatics and Phylogenetic Analysis
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Alignment of Genomic Sequences Wen-Hsiung Li Ecology & Evolution Univ. of Chicago.
Aynaz Taheri 1 C. Gyles and P. Boerlin. * Transfer of foreign DNA * Mechanisms of transfer of DNA * Mobile genetic elements (MGE) * MGEs in the virulence.
Sequencing a genome and Basic Sequence Alignment
Microbial Genetics Mutation Genetic Recombination Model organism
Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat.
Genetic transfer and recombination
Genetic exchange Mutations Genetic exchange: three mechanisms
L. 5: Prokaryotic Genetics. 2nd Biology ARA Lecture 5. GENETICS OF PROKARYOTES 1. Basic concepts 2. The prokaryotic genome 3. The pan-genome.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Microbial Models I: Genetics of Viruses and Bacteria 7 November, 2005 Text Chapter 18.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Identification of large-scale genomic rearrangements between closely related organisms Bob Mau 1,2, Aaron Darling 1,3, Fred Blattner 4,5, Nicole Perna.
Sequencing a genome and Basic Sequence Alignment
BACTERIAL TRANSPOSONS
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Identifying conserved segments in rearranged and divergent genomes Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling.
Identification of Copy Number Variants using Genome Graphs
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Lecture # 04 Cloning Vectors.
Genome Rearrangement By Ghada Badr Part I.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Microbial Models I: Genetics of Viruses and Bacteria 8 November, 2004 Text Chapter 18.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Fundamentals of Genetic Engineering. 2 2 What are vectors? Vectors are DNA molecules that act as destination for GOI. Vectors act as a vehicle to ultimately.
Copyright © 2011 Pearson Education Inc. Lecture prepared by Mindy Miller-Kittrell, University of Tennessee, Knoxville M I C R O B I O L O G Y WITH DISEASES.
Visualizing Biosciences Genomics & Proteomics. “Scientists Complete Rough Draft of Human Genome” - New York Times, June 26, 2000 The problem: –3 billion.
MICROBIOLOGIA GENERALE
E.Coli AS MODERN VECTOR.
Microbial Genetics Eukaryotic microbes: fungi, yeasts Eukaryotic genome Chromosomal DNA Mitochondrial DNA Plasmids in yeast Prokaryotic.
TRANSFERIMIENTO LATERAL DE GENES
Reconstructing the Evolutionary History of Complex Human Gene Clusters
Virus Basics - part I Viruses are genetic parasites that are smaller than living cells. They are much more complex than molecules, but clearly not alive,
APPLICATION OF PHAGES IN BIOTECHNOLOGY TRANSDUCTION CRE LOX P SYSTEM
Horizontal gene transfer and the history of life
Extra chromosomal Agents Transposable elements
The Complete Genome Sequence of Escherichia coli K-12
GENETIC EXCHANGE BY NIKAM C.D. ASSISTANT PROFESSOR
Transposable Elements
Unit Genomic sequencing
E.Coli AS MODERN VECTOR.
Dissemination of Antibiotic Resistance Genomes
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Identification of large-scale genomic rearrangements between closely related organisms Bob Mau 1,2, Aaron Darling 1,3, Fred Blattner 4,5, Nicole Perna 1,5 Departments of Animal Health and Biomedical Sciences 1, Oncology 2, Computer Science 3, Laboratory of Genetics 4, Genome Center University of Wisconsin – Madison

The Amazing Variety of Diseases caused by E.coli strains in Bacterial Pathogenesis: A Molecular Approach “… is due to the fact different strains have acquired different sets of virulence genes. Most strains of E.coli are avirulent because they lack these virulence genes. E.coli is an excellent example of the maxim that it is the set of virulence genes carried by an organsims that make it a pathogen, not its species or genus designation.”

Categories of Bacterial Genome Evolution Local Single Base Mutations Indels (Small insertions and deletions Global (Large-scale) Rearrangements Inversions, translocations, inverted translocations Gene Gain and Loss Horizontal or Lateral Transfer Transformation, Transduction, and Conjugation Phage Integration Mobile Elements Transposons and Insertion Sequences Gene Duplication ( Mediated by mobile elements )

From the two E. coli genomes sequenced at the Blattner lab, we’ve identified: ~3900 genes common to both K-12 and O157:H7 528 genes unique to K genes unique to O157:H7 40 % of these genes are of unknown function. Culprits for these wholesale differences: lateral transfer and phage integration

Strategy of Global Alignment of Two Highly Related Genomes: K O Partially Sorted Suffix Arrays STEP 1 Quickly find all 16-mer matches between genomes (K 1,O 1 ) : (K i,O i ) : (K n,O n ) STEP 2 Collapse consecutive pairs to form a collection of maximally exact matches. (MEMs) Use LIS algorithm to construct a collinear set of maximally ordered matches. STEP 3 Extend across intervening regions via anchored alignments from individual MEM endpoints Unique Insert Substitution

K-12 vs O157:H7 MEM Stats 43,235 total MEMs (  24 bps) 31,640 form maximal collinear subset The largest exact match is 2,632 bases 62 MEMs exceed 1000 bps Over 11,000 exceed 100 bps 18,212 single base differences (SNPs) Resulted in a segmentation of O157:H7 into 357 intervals of backbone or unique insert.

A Three-way Genomic Comparison: Parkhill et.al. Nature E. coli K-12 MG1655 S. Typhi CT18 S. Typhi- murium LT2

The “Traditional” WAY to view MEMs {(a 0,b 0 ),(a 1,b 1 ),…, (a K,b K )} for K+1 genomes For the reference genome G 0, a 0 < b 0 by convention. For the NON reference genomes, a k b k means the match occurs on the opposite strand (reverse complement)

A novel approach, wherein: Extensibility: works just as well for N as it does for 2 genomes, provided there is sufficient sequence similarity. Automatically identifies inversions, translocations, and inverted translocations Determines a maximal collinear subset within each locally collinear region, without recourse to an LIS step Extremely space efficient and fast

Multiple Oriented Offset For each non-reference genome, determine the polarity with respect to G 0 As well as the offset: The Multiple Oriented Offset is the N vector:

Canonical MEM Equivalence Classes By appending the interval in reference genome coordinates: (a 0, b 0 ) to the Moo, the MEM is completely specified. We aggregate MEMs by their generalized offset, inducing a partition on the set of MEMs. This defines a CMemEC: {Moo,{(a 0 1, b 0 1 ), (a 0 2, b 0 2 ),…, (a 0 M, b 0 M )}}

In this example, it’s clear from the plot that there are two large rearrangements, one around the origin and the other about the terminus of replication.

In this example, it’s clear from the plot that there are two large rearrangements, one around the origin and the other about the terminus of replication. So…. We could probably get by with modest extensions of existing methods: (e.g. MUMmer or our earlier algorithm) to account for laterally transferred lineage- specific sequence.

In this example, it’s clear from the plot that there are two large rearrangements, one around the origin and the other about the terminus of replication. So…. We could probably get by with modest extensions of existing methods: (e.g. MUMmer or our earlier algorithm) to account for laterally transferred lineage- specific sequence. But, biology is never that accommodating...

Hmmmm………….

You hear the one about the biologist, the statistician, and the mathematician volunteering for the psychology experiment ?

Approach Our strategy is to take a multidimensional identification and rearrangement problem, and recast it as a segmentation problem. The rationale is multifaceted: sequence relationships among genomes are easily quantified in this context, established statistical techniques can be used to differentiate signal from noise, and two dimensional segmentation graphs are intuitive and visually appealing. The framework leads to a simple and direct solution.

Simplest Block and Strip Diagram G 1 : Strip 1 G 2 : Strip 2 G 3 : Strip G 4 : Strip G 0 : Reference Strip

Cut pt. Terminus Origin G 0 : Reference G 1 : Genome G 2 : Genome 2 G 3 : Genome G 5 : Genome 5 G 4 : Genome Example with Variable Block Lengths

Large-scale Genomic Rearrangements in evolutionary context Genome 2 Genome 1 Zero Pt. Terminus Origin Genome 3 Genome 4 Genome 5 Species Tree MRCA

Segmentation Graph S(G 0 )

Sorted Merge Lists of Six Enterobacterial strains MG1655 W3110 EDL933 Sakai CT18 LT2 Six SMLs of bimers, one for each genome. A bimer is the lexicographically lesser of an n-mer (we use n=23) and its reverse complement, together with an orientation flag. K-12 O157:H7 Typhi Typhimurium Escherichia coli Salmonella Enterica

For non-reference genome D:\Perna_Land\Genomes\Ecoli\W3110_inv.fas Block # St BLK End BLK # of BLKS NRst NRend RFst RFend BUMsize NRsize Refsize Diff For non-reference genome D:\Perna_Land\Genomes\Ecoli\EDL933.fas Block # St BLK End BLK # of BLKS NRst NRend RFst RFend BUMsize NRsize Refsize Diff For non-reference genome D:\Perna_Land\Genomes\Salmonella\stmur.fas Block # St BLK End BLK # of BLKS NRst NRend RFst RFend BUMsize NRsize Refsize Diff For non-reference genome D:\Perna_Land\Genomes\Salmonella\styphii.fas Block # St BLK End BLK # of BLKS NRst NRend RFst RFend BUMsize NRsize Refsize Diff

C20 C21 C22 C22.5 C23 C24 C25 C1 C2 C3 C4 C5 C6 C7 A Transformation of CO92 to KIM by Inversions Near the Origin K5 K4 K3 K2 K1 K25 K24 K23 K22 K21 K20.5 K20 K19 K