Dissecting plant genomes using PLAZA 2.5 Michiel Van Bel 1,2+, Sebastian Proost 1,2+, Elisabeth Wischnitzki 1,2, Sara Mohavedi 1,2, Christopher Scheerlinck.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics Comparative Analysis Q
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
PLAZA 2.5 – a resource for plant comparative genomics Michiel Van Bel Bioinformatics & Evolutionary Genomics group Comparative & Integrative Genomics group.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Comparative genomics Joachim Bargsten February 2012.
GenomePixelizer - a visualization tool for comparative genomics within and between species. A. Kozik, E. Kochetkova, and R. Michelmore (Department of Vegetable.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Yeast genome sequencing: the power of comparative genomics MEDG 505, 03/02/04, Han Hao Molecular Microbiology (2004)53(2), 381 – 389.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
Solutions for the PLAZA genomics part of the SPICY workshop on genomics More information: Website:
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Using blast to study gene evolution – an example.
Bioinformatics and Computational Biology
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Copyright OpenHelix. No use or reproduction without express written consent1.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used for.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Visualizing Biosciences Genomics & Proteomics. “Scientists Complete Rough Draft of Human Genome” - New York Times, June 26, 2000 The problem: –3 billion.
Supplementary Fig. 1 Supplementary Figure 1. Distributions of (A) exon and (B) intron lengths in O. sativa and A. thaliana genes. Green bars are used.
Basics of Comparative Genomics
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
ChipViewer is coded to visualize and analyze the tiling chip data.
University of Pittsburgh
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Genome Annotation w/ MAKER
Ultraconserved Elements in the Human Genome
Volume 11, Issue 3, Pages (March 2018)
Functional Impact of Transposable Element using Bioinformatic Analysis
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Gautam Dey, Tobias Meyer  Cell Systems 
Basics of Comparative Genomics
Volume 8, Issue 7, Pages (July 2015)
Volume 11, Issue 3, Pages (March 2018)
Presentation transcript:

Dissecting plant genomes using PLAZA 2.5 Michiel Van Bel 1,2+, Sebastian Proost 1,2+, Elisabeth Wischnitzki 1,2, Sara Mohavedi 1,2, Christopher Scheerlinck 3, Yves Van de Peer 1,2 and Klaas Vandepoele 1,2 ( +contributed equally) 2 Department of Plant Biotechnology and Genetics, Ghent University, Technologiepark 927, 9052 Ghent, Belgium 1 Department of Plant Systems Biology, Bioinformatics and Systems Biology Division, VIB, Technologiepark 927, 9052 Ghent, Belgium 3 Department Industrial Sciences BME-CTL, Hogeschool Gent, B-9000 Ghent, Belgium The on-line comparative genomics platform PLAZA (Proost S. et al., 2009) was designed to offer comprehensive data to non-bioinformaticians in a user-friendly way. However, since the release of the first version of PLAZA the number of available plant genomes has more than doubled. Hence an update of the species became necessary, along with under-the-hood improvements to allow this increase in data and the inclusion of several new tools. Here we present the novel version of PLAZA that now contains 25 plant species and, along with the tools present in previous versions, comes with a new sets of tools to dissect the evolution of these plant genomes. Data included since the first version Structural and functional annotation Gene families and sub-families Multiple sequence alignment Reconciled phylogenetic trees Genomic homology with Ks/4dtv dating Tools and viewers associated with all types of data Workbench for analysis of custom gene sets New features Interactive visualizations Orthology Viewer Functional clustering Improved detection of genomic homology Extra features in workbench Introduction Conclusion Here we present an significant update of both the raw data as the features of the PLAZA comparative genomics platform. By including novel species researchers working on any of the new species can directly benefit from this update. While others still have the option to quickly map genes to a related included species using the workbench’s BLAST interface. The new tools enable additional analyses, that were not possible using previous versions. This new update ensures PLAZA, that currently is visited by dozens of scientist each day, will stay a powerful tool for plant researchers worldwide. Conclusion Here we present an significant update of both the raw data as the features of the PLAZA comparative genomics platform. By including novel species researchers working on any of the new species can directly benefit from this update. While others still have the option to quickly map genes to a related included species using the workbench’s BLAST interface. The new tools enable additional analyses, that were not possible using previous versions. This new update ensures PLAZA, that currently is visited by dozens of scientist each day, will stay a powerful tool for plant researchers worldwide. Detection of Genomic Homology Using i-ADHoRe 3.0 To study genome evolution it is imperative to know which regions are derived from a common ancestor. This way remnants of whole genome duplications in a single species and rearrangements between species can be mapped. To deal with the vast amount of data in this new release of PLAZA, i-ADHoRe, the tool to detect genomic homology has been significantly improved. The implemenation of a new gene order alignment algorithm (Fostier J. et al., 2011) enables an accurate and sensitive detection in large datasets, while the implementation of support for modern hardware (like multi–core cpu’s and computer cluters) and under- the-hood optimizations keep runtimes acceptable even on extremely large datasets. Detection of Genomic Homology Using i-ADHoRe 3.0 To study genome evolution it is imperative to know which regions are derived from a common ancestor. This way remnants of whole genome duplications in a single species and rearrangements between species can be mapped. To deal with the vast amount of data in this new release of PLAZA, i-ADHoRe, the tool to detect genomic homology has been significantly improved. The implemenation of a new gene order alignment algorithm (Fostier J. et al., 2011) enables an accurate and sensitive detection in large datasets, while the implementation of support for modern hardware (like multi–core cpu’s and computer cluters) and under- the-hood optimizations keep runtimes acceptable even on extremely large datasets. A major advantage of comparative genomics is that it allows the transfer of knowledge from one species, most likely a model-organism, to other less studied species. Orthologous genes are of crucial importance for this, since they are derived from a common ancestor, separated by speciation events, they are likely to have a similar function. However, in case of species specific duplications, one gene might have multiple co-orthologs in the other species. Additionally each method to detect orthologs comes with a unique set of advantages and disadvantages. Therefore information from several sources was integrated in the Orthology Viewer, to allow users to make more informed decisions to find functional orthologs. Orthology Viewer: a quick way to find functionally related genes From literature it is known that functionally related genes can be present in close proximity to each other, this has several distinct advantages, they have a low linkage distance and can be co-regulated in additional ways (eg. chromatin modification). In this new version of PLAZA such functional cluster have been detected and an new viewer for this novel datatype is implemented. Functional Clustering Through the normal web platform studying a mid- to large-scale set of genes becomes a tedious, repetitive, time- consuming task. To allow users to retrieve data for a set of genes faster, without hassle the workbench was created. After creating an account user can quickly create private experiments containing sets of genes. Using the import through BLAST, protein or cDNA/EST sequences from species not included in PLAZA can be mapped onto a close relative. Within such an experiment users can quickly compare intron-exon structure, interpro domains, the mode of duplication, functional enrichment, … Like in the main platform, each datatype is associated with viewers that make interpretation of the results easy and efficient. Workbench: allowing users to study their favorite genes (A)(B) Figure 4. (A) gene order alignment of an conserved region in vertebrates, arced lines connect co-expressed genes (black) and genes that code for interacting proteins (blue). (B) Integration of i-ADHoRe data in PLAZA 2.5, the circleplot shows the 5 chromosomes of Arabidopsis thaliana with regions duplicated during the last whole genome duplication connected using green lines, homologous regions in A. lyrata are indicated by different colors, for different chromosomes. The most outward ring shows coding gene densitiy, low densitiy can be observed in centromeric regions. S. Proost*, M. Van Bel*, L. Sterck, K. Billiau, T. Van Parys, Y. Van de Peer, and K. Vandepoele. Plaza: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell, 21(12): , 2009 (* contributed equally) J. Fostier*, S. Proost*, B. Dhoedt, Y. Saeys, P. Demeester, Y. Van de Peer, and K. Vandepoele. A greedy, graph-based algorithm for the alignment of multiple homologous gene lists. Bioinformatics, 27(6): , 2011 (* contributed equally) References In PLAZA there are four different ways to detect orthologous genes have been integrated, using Reciprocal Best Hits (RBH, BLAST based), OrthoMCL, by reconciliation of phylogenetic trees and positional orthologs. Using a interactive apple orthologs for a specific gene in other organisms can be found in an intuitive way. The different colors in the diamonds show the various types of evidence. Figure 2. The selected poplar gene PT01G09430 has 4 homologs (genes derived from a common ancestor) in grape, while various genes are considered orthologs, only one gene (VV02G03380) is a confirmed ortholog using all methods. (RBH, purple; Tree based, blue; orthoMCL, green; anchorpoint, orange) Figure 3. Functionally enriched gene clusters on a part of Arabidopsis thaliana chromosome 1. Figure 1. Steps and tools required to build PLAZA, and various viewers to browse the generated data. Figure 5. (A) Venn diagram of the mode of duplication of the genes in the set. (B) Easy side-by-side comparison of intron-exon structure. (C) Distribution of transcription factors in the Arabidopsis thaliana chromosome 1 and 5. (D) GO-graph with enriched labels indicated. (A) (B) (C)(D)