Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara 1111, Apurva Narechania 1, Joshua Stein 1, William Spooner 1, Sharon Wei.

Slides:



Advertisements
Similar presentations
Advancing Science with DNA Sequence Maize Missouri 17 chromosome 10 project update Dan Rokhsar 3 October 2006.
Advertisements

Maize Genetics, Genomics, Bioinformatics workshop
Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
History, protohistory and prehistory of the Arabidopsis thaliana chromosome complement Henry Yves et al 2006, in press.
The IWGSC: Building the sequence-based foundation for accelerated wheat breeding Kellye A. Eversole IWGSC Executive Director & The IWGSC Cereals for Food,
Lecture 14 Genome sequencing projects
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Assembly.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Whole Genome Assembly. WGA 1. Screener 2. Overlapper 3. Unitigger, 4. Scaffolder, 5. Repeat Resolver.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Similar Sequence Similar Function Charles Yan Spring 2006.
Comparative Genome Maps CSCI : Computational Genomics Debra Goldberg
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Plant genomes: phenotypes evolving by new rules Todd J. Vision Department of Biology University of North Carolina at Chapel Hill.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
RICE GENOMICS: Progress and prospects. What is genomics?  The genome of a plant, animal or microbe is the totality of its genetic information including.
Mouse Genome Sequencing
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Tomato genome annotation pipeline in Cyrille2
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
A Comparative mapping resource GRAMENE Doreen Ware USDA ARS Cold Spring Harbor Laboratory
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
Sequence and Analysis of the Maize B73 Genome Doreen Ware 1,2, Joshua Stein 1, Apurva Narechania 1, Shiran Pasternak 1, Linda McMahan 1, Chengzhi Liang.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
GMOD/GBrowse_syn Sheldon McKay iPlant Collaborative DNA Learning Center Cold Spring Harbor Laboratory.
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Sequencing a genome and Basic Sequence Alignment
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Mark D. Adams Dept. of Genetics 9/10/04
Comparative genomics Haixu Tang School of Informatics.
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
Comparative genomics of Gossypium and Arabidopsis: Unraveling the consequences of both ancient and recent polyploidy Junkang Rong, John E. Bowers, Stefan.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Welcome to the combined BLAST and Genome Browser Tutorial.
GMOD/GBrowse_syn Sheldon McKay iPlant Collaborative DNA Learning Center Cold Spring Harbor Laboratory.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Gramene Technical Improvements
Genetics and Evolutionary Biology
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Comparative Genomics.
Chapter 4 The Interrupted Gene.
CSCI 1810 Computational Molecular Biology 2018
Cereal Genome Evolution: Grasses, line up and form a circle
Presentation transcript:

Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara 1111, Apurva Narechania 1, Joshua Stein 1, William Spooner 1, Sharon Wei 1, Ben Faga 1, Shiran Pasternak 1, and Doreen Ware 1, 2 1 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY11724, USA 2 USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, USA Summary The maize genome has been largely shaped by its history of tetraploidization, subsequent rearrangement and duplicate gene loss. Disruption of synteny has also resulted from apparent gene movement in both maize and sorghum relative to rice. Many questions remain concerning the evolution of cereals, including the extent of lineage-specific rearrangements, selective forces that dictated the retainment of duplicate genes, and the extent of conserved non-coding regions. The availability of three nearly complete cereal genomes (maize, rice and sorghum) provides an unprecedented opportunity to use comparative genomics to answer these and other questions in the evolution of plant genomes. As part of the Maize Genome Sequencing Project, we describe the use of the Ensembl Compara whole genome alignment pipeline to construct sequence-based syntenies. The pipeline automates pairwise whole genome analysis by parallelizing the construction of blastz alignments, their subsequent consolidation into chains and nets, and their coalescence into syntenic regions. The algorithms employed identify highly similar regions between two large sequences while allowing for segments without similarity, thus highlighting gene movement or genomic rearrangement within syntenic blocks. The tetraploid nature of maize and its history of whole genome duplications suggest that much of its genome should have at least two blocks that align to the same region of rice. Preliminary analysis using a pilot 22 megabase maize assembly spanning maize chromosome 4 exhibits synteny to a comparably sized region on rice chromosome 2. In agreement with marker-based syntenic studies, we show that this rice chromosome has a duplicate homelogue on maize chromosome 5. We address the challenges of applying this pipeline to the maize genome in its partially assembled state. Blastz-CHAIN-NET and the Ensembl Hive Blastz-NET Alignment Stats (Maize Accelerated Region) Syntenic Blocks Between Maize, Rice, and Sorghum Distribution of blastz-NET sizes for Rice and Sorghum Alignments Region Statistics Region Alignment Statistics Total lengthAlignable SequenceRice Aligned CoverageSorghum Aligned Coverage Total AlignmentsChain or Net AlignmentsChainsNets Rice Sorghum Blastz-NET coverage by NET Level Blastz-NET coverage by Rice Chromosome Blastz-NET coverage by Sorghum Chromosome Alignable Sequence refers to the portion of the maize accelerated region that is of high quality and has not been RepeatMasked. Sorghum blastz-NETs align 66% of the alignable maize sequence, while rice aligns 35% of the available accelerated region. The majority of Blastz-NETS cluster on rice chromosome 2 and sorghum chromosome 4 in agreement with known marker based synteny. Proc Natl Acad Sci U S A Sep 13;102(37): The maize accel region contains syntenic blocks to rice chr2 and sorghum chr4 Maize: max gap between NETS 100,000 residues; min NET size 5000 residues. Rice and sorghum: max NET gap 50,000 residues; min NET size 2000 residues. Syntenic blocks are defined in two steps. First, NETS are grouped if the distance between them is smaller than twice the max gap parameter and there are no NETS breaking the synteny. Second, these groups are arranged into syntenic blocks up to 30 times the max gap parameter with two synteny breaking groups allowed. The rice assembly is complements of TIGR (version 5), and early access to the sorghum assemblies complements of JGI. Aligned Stats ClassAvg LenMedian LenMax LenMin LenCount Level Level Level Span Stats ClassAvg spanMedian spanMax spanMin spanCount Level Level Level Aligned Stats ClassAvg LenMedian LenMax LenMin LenCount Level Level Level Level Span Stats ClassAvg spanMedian spanMax spanMin spanCount Level Level Level Level Rice Stats Sorghum Stats Rice and Sorghum Level 1/2 Distributions Blastz-NET lengths are defined as the number of aligning bases in a NET excluding gaps while blastz-NET spans are the distances from the first to the last base in the NET including gaps. Level 1 NETS consistently show the longest length and span across species. Sorghum NETS are considerably longer than those found in rice. Despite large differences in lengths and spans across levels and species, the overall distributions are similar, highlighting the influence of biologically significant outliers. Maize BAC-contigs versus Rice at MaizeSequence.org Maize Accelerated Region Duplication Rice Chr2 from positions 29MB to 36MB aligns to Maize Chromosomes 4 and 5 in equal measure indicating a duplication event. Alignments were made to maize BAC-contigs and mapped to Chromosomes 4 and 5 using the FPC map. The majority of Chr4 hits were on FPC ctg182, corresponding to the accelerated region. The majority of NETS on Chr5 were on contigs 250, 251, 253, and 254 in agreement with marker based studies. PLoS Genet Jul 20;3(7):e123 SubmitGenome ChunkAndGroupDNA CreatePairAlignerJobs Blastz UpdateMaxAlignmentLength FilterDuplicates CreateAlignmentChainsJobs AlignmentChains UpdateMaxAlignmentLength CreateAlignmentNetsJobs AlignmentNets Blastz AlignmentChains UpdateMaxAlignmentLength The Blastz-CHAIN-NET pipeline creates long range gapped pairwise blastz chains and nets from raw blastz alignments thereby allowing for genomic rearrangements in syntenic regions. Proc Natl Acad Sci U S A Sep 30;100(20): The Ensembl Hive pipeline parallelizes the generation of blastz alignments and their consolidation into chains and nets using a hive system that creates specific jobs and spawns anonymous, general workers to complete those jobs. Nucleic Acids Res Jan;36(Database issue):D In its partially assembled state, the longest contiguous regions at maizesequence.org are the BAC contigs. Whole genome alignments to rice for all BAC contigs are available and correspond well to FgenesH predictions with similarity to known proteins and maize ESTs. Gene Predictions Associated with Blastz-NETs 39% of maize genes within syntenic blocks are non-syntenic, suggesting substantial gene movement within maize. Almost 50% of rice genes are non-syntenic, possibly due to loss of duplicate genes w/in maize homeologous regions. Methods: Syntenic blocks were defined using from BLASTZ-Chain-Net data using parameters MaxDist and MinDist as described in the synteny views above. Genes (excluding TE’s) were counted as syntenic if they overlapped a chain HSP that contributed to the synteny.