Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Upload, Display and Analyze Your Data on the Gramene/Ensembl Genome Browser Sushma Naithani (Botany and Plant Pathology, Oregon State University, Corvallis,
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Comparative genomics Joachim Bargsten February 2012.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
2013 iPlant workshop Marcela Karey Monaco What is Gramene? An integrated plant reference genome resource Comparative genomics hub of data & tools.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
NGS Analysis Using Galaxy
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Managing Data Modeling GO Workshop 3-6 August 2010.
A Comparative Genomics Resource for Grains. Tutorial Tips If you are viewing this tutorial with Adobe Acrobat Reader, click the "bookmarks" on the left.
GMOD/GBrowse_syn Sheldon McKay iPlant Collaborative DNA Learning Center Cold Spring Harbor Laboratory.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Why do we need good quality annotations? Pankaj Jaiswal Oregon State University Gene Annotation Workshop July 31, 2010 ASPB Plant Biology 2010 Montreal,
Welcome to DNA Subway Classroom-friendly Bioinformatics.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Solutions for the PLAZA genomics part of the SPICY workshop on genomics More information: Website:
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
A Comparative Genomic Mapping Resource for Grains.
Data Mining in Ensembl with BioMart Nov,
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
A Comparative Genomics Resource for Grains V26. Tutorial Tips If you are viewing this tutorial with Adobe Acrobat Reader, click the "bookmarks" on the.
Data Mining in Ensembl with BioMart Giulietta Spudich.
A collaborative tool for sequence annotation. Contact:
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
InterPro Sandra Orchard.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
A Comparative Genomic Mapping Resource for Grains.
GeneConnect Use Cases and Design August 3, GeneConnect Database IDs are linked by Direct Annotation, Inferred Annotation, or Sequence Alignment.
Gramene Technical Improvements
Basics of Comparative Genomics
Sequence based searches:
Genome Annotation Continued
Visualization of genomic data
ID Mapping tools: Converting Accessions between Databases
Genome Annotation w/ MAKER
Ensembl Genome Repository.
Explore Evolution: Instrument for Analysis
Basics of Comparative Genomics
Problems from last section
Welcome - webinar instructions
Presentation transcript:

Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1

Exploring Plant Genomes Browse Search Upload personal data Analysis tools

Gramene’s Key Strengths Comparative genomics – Complete reference genomes for 11 plant species including A. thaliana & A. lyrata – Whole genome alignments – Phylogenetic gene trees Ability to upload and share data Data mining using Gramene Mart Extensive variation data sets for Arabidopsis Integration with Pathways databases

Quick entry points

Browser tracks Whole genome alignments Synteny views Location-based variation

Gene sequence Splice variants Gene centered variation Phylogenetic trees Cross-reference to external databases Gene sequence Splice variants Gene centered variation Phylogenetic trees Cross-reference to external databases

Transcript & protein sequences Protein structure Transcript & protein based variation GO and other ontologies Transcript & protein sequences Protein structure Transcript & protein based variation GO and other ontologies

Location View Browser Tracks TAIR 10 Annotation EST/cDNA alignments Array probes Variation Genome alignments -cross-species browsing Repeats

Configuring Tracks

Standard Analysis & Visualization InterPro domain & GO functional annotation Cross-reference to external ID’s Whole Genome Alignment (Blastz-chain-net) Phylogenetic Gene Trees (Compara) Synteny Analysis Consequences of SNP 11

InterPro/dbXref/GO Structural prediction: Pfam, PIRSF, PRINTS, PROSITE, SMART, SUPERFAMILY, TIGRFAM, TMHMM, SignalP Cross-reference genes to 3 rd party identifiers: Entrez Gene, PlantGDB, PUTs, RefSeq, Gene Index, UniGene, UniProtKb/Swissprot, NASC, IPI, WikiGene Gene Ontology, Plant Ontology

Alignment View Pairwise BLASTZ-CHAIN- NET whole genome alignment Arabidipsis lyrata, Poplar, Grapevine Rice, Brachypodium, Sorghum Physcomitrella

Multi-species View A. lyrata Arabidopsis Grapevine Poplar

Conserved non-coding regions 15

View Sequence Alignment

Phylogenetic Analysis Tools

18 Compara Gene Trees Gene Trees for 11 plants plus human, Ciona, fly, worm, & yeast Infers orthologs and paralogs by reconciling gene tree with input species tree Taxonomic dating Gene Trees for 11 plants plus human, Ciona, fly, worm, & yeast Infers orthologs and paralogs by reconciling gene tree with input species tree Taxonomic dating Reconstructing evolutionary histories mology_method.html Vilella A.J., et al. (2008). Genome Res. Pre-print: doi: /gr ~35,000 trees ~24,500 plant specific ~10,000 containing Arabidopsis 1059 specific to Arabidopsis genus 79 specific to A. thaliana 527 specific to A. lyrata ~35,000 trees ~24,500 plant specific ~10,000 containing Arabidopsis 1059 specific to Arabidopsis genus 79 specific to A. thaliana 527 specific to A. lyrata

Tree Viewer Speciation node = ortholog Duplication node = paralog

Newick Tree & Alignment 20 (((ENSCINP _Cint_:0.0000, R10D12.12_Cele_:3.4477):0.7716, FBpp _Dmel_:0.8566):0.0000, (((((BRADI3G _Bdis_:0.0615, BRADI2G _Bdis_:0.1536):0.0214, ((LOC_Os02g _Osat_:0.0000, BGIOSGA PA_Oind_:0.0000):0.0000, ORGLA02G _Ogla_:0.0000):0.0938):0.0231, (((GRMZM2G050705_P02_Zmay_:0.0099, GRMZM2G124671_P01_Zmay_:0.0745):0.0043, Sb08g _Sbic_:0.0348):0.0000, (GRMZM2G022470_P01_Zmay_:0.0475, Sb04g _Sbic_:0.1037):0.0000):0.0917):0.1118, (((POPTR_0005s _Ptri_:0.0420, POPTR_0013s _Ptri_:0.0427):0.0918, (GSVIVT _Vvin_:0.0342, GSVIVT _Vvin_:0.0817):0.1210):0.0363, ((scaffold_ _Alyr_:0.0043, scaffold_ _Alyr_:0.0632):0.0277, AT4G _Atha_:0.0204):0.2813):0.1261):0.5081, E_GW _Ppat_:0.3698):0.3605):0.0000; ORGLA02G _Ogla_ VFVTVGTTCF DALVKAVDSP QVKEALLEKG YTDLIIQMGR GTY BRADI2G _Bdis_ VFVTVGTTCF DALVKAVDSE EVKQALLRKG YTDLLIQMGR GTY GRMZM2G050705_P02_Zmay_ VFVTVGTTCF DALVMAVDSP EVKKALLQKG YSNLLIQMGR GTY POPTR_0005s _Ptri_ VFVTVGTTLF DALVRTVDTK EVKQELLRNG YTHLIIQMGR GSY GRMZM2G022470_P01_Zmay_ VFVTVGTTCF DALVMAVDSP EVKKTLLQKG YSNLLIQMGR GTY BRADI3G _Bdis_ VFVTVGTTCF DALVKKVDSP QVKEALWQKG YTDLFIQMGR GTY GSVIVT _Vvin_ VFVTVGTTCF DALVKAVDTQ EFKKELSARG YTHLLIQMGR GSY Sb08g _Sbic_ MAVDSP EVKMALLQKG YSNLLIQMGR GTY GRMZM2G124671_P01_Zmay_ VFVTVGTTCF DALVMAVDSP EVKKALLQKG YSNLLIQMGR GTY Sb04g _Sbic_ MAVASP EVKKALLQKG YSNLVIQMGR GTY BGIOSGA PA_Oind_ E_GW _Ppat_ VLVTVGTTLF DALVREASSQ PCRQVLADFG YSSLVIQRGK GSF scaffold_ _Alyr_ VFVTVGTTSF DALVKAVVSE DVKDELQKRG FTHLLIQMGR GIF R10D12.12_Cele_ NQDVIDR ENSCINP _Cint_ IFVTVGTTSF DELTETITSK PVQKVLQSQG YDKVTIQYGR GKH scaffold_ _Alyr_ VFVTVGTTSF DALVKAVVSE DVKDELQKRG FTHLLIQMGR GNF AT4G _Atha_ VFVTVGTTSF DALVKAVVSQ NVKDELQKRG FTHLLIQMGR GIF LOC_Os02g _Osat_ VFVTVGTTCF DALVKAVDSP QVKEALLEKG YTDLIIQMGR GTY GSVIVT _Vvin_ VFVTVGTTCF DALVKAVDTH EFKRELFARG YTHLLIQMGR GSY FBpp _Dmel_ VYITVGTTKF DALISTASTE PALKALQNRK CTKLVIQHGN SQP POPTR_0013s _Ptri_ VFVTVGTTLF DALVRTVDTK EVKQELLRKG YTDLVIQMGR GSY

Orthologs & Paralogs 21

Gene-Centered Synteny Build 22 Oryza sativa JaponicaO.jap Brachypodium distachyonYESB.dis Sorghum bicolorYES S.bic Arabidopsis thaliana---A.tha Arabidopsis lyrata---YESA.lyr Vitis vinifera---YES V.vin Poplar trichocarpa---YES P.tri Compara OrthologsCollinear mappings (DAGchainer) “in-range” mappings near collinear anchors Map

Synteny View Available for A. lyrata, grapevine, & poplar Navigate to other genome Ortholog browser Link to multi-species view

Browse across duplicated regions from polyploidy Chr 1 vs PoplarChr 1 vs GrapevineSwitch reference to grape

Some Applications …

Distinguish “Real” Genes From Transposons 26 FAR1/FHY3 transcription factor family functions in light sensing Evolved from Mu-related transposes Cannot distinguish by BLAST FHY3 “Rule-in” functioning genes Missing annotation in A. lyrata? Domesticated TE

Enrich Annotations in Other Species Arabidopsis and Rice orthologs both show one gene Arabidopsis ortholog in correct syntenic context 27 Putative mis-annotated Grape gene

Adding Custom Tracks

Custom Tracks Salk T-DNA lines Uploaded from my laptop GFF file format EST alignments from non-model plants DAS: Distributed Annotation system Protocol for sharing 3 rd party data DAS Registry Methylome (Ecker) Uploaded from an URL BED file format

Upload Your Data chr1 SALK T-DNA e ID=SALK_ x chr1 SALK T-DNA e ID=SALK_ x chr1 SALK T-DNA e ID=SALK_ x chr1 SALK T-DNA e ID=SALK_ n chr1 SALK T-DNA e ID=SALK_ x chr1 SALK T-DNA e ID=SALK_ n chr1 SALK T-DNA ID=SALK_ x

Attach From Remote File track name="mCIP col/met1 BU" color=darkgreen description="Methylation" useScore=3 visibility=2 height=30 chr mCIP_col/met1_BU chr mCIP_col/met1_BU chr mCIP_col/met1_BU chr mCIP_col/met1_BU chr mCIP_col/met1_BU chr mCIP_col/met1_BU

Add DAS: Distributed Annotation System Protocol for sharing 3 rd party data via a DAS registry

Manage Custom Tracks

Turn On/Off Custom Tracks

GrameneMart Orthologs in lyrata, grape, poplar, rice, Brachypodium, sorghum maize, & moss Custom queries for bulk downloads Powerful tool for data mining

BioMart Use Cases All transmembrane-targeted genes, showing InterPro domains, GO terms, and AFFY id’s

BioMart Use Case Evolution of cyclin genes: Taxon of origin for paralog pairs of cyclin-domain genes that have an ortholog in Physcomitrella

BioMart Use Cases Mine germplasm for loss of function alleles in diversity populations: All Myb-domain genes with “STOP_GAINED” SNP allele

Additional Data Access 39 FTP: Data files, SQL dump, SoftwareRead-only Public MySQL Web Services

HELP!

Contact Us