Annotation and Visualization Doreen Ware. Project Challenges Rapidly growing sequence data Full annotation of all clones New high-performance computing.

Slides:



Advertisements
Similar presentations
MaizeGDB: A Next-Generation Maize Database
Advertisements

Model Organism Databases and Community Annotation
Advancing Science with DNA Sequence Maize Missouri 17 chromosome 10 project update Dan Rokhsar 3 October 2006.
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Sequencing the Maize Genome Maize Genome Sequencing Consortium
Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University.
SG KB 2009 NIGMS Workshop: Enabling Technologies for Structural Biology Section on Structural Analysis Margaret J. Gabanyi March 4, 2009 How to Use the.
Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara 1111, Apurva Narechania 1, Joshua Stein 1, William Spooner 1, Sharon Wei.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
Lecture 2.21 Retrieving Information: Using Entrez.
How to access genomic information using Ensembl August 2005.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Annotation BCB 660 October 20, From Carson Holt.
WFleaBase Daphnia Genome Database from Common Components Daphnia Genomic Consortium Meeting, Sept Don Gilbert,
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Mouse Genome Sequencing
The Ensembl Gene set The “Genebuild” 21 April 2008.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Sequence and Analysis of the Maize B73 Genome Doreen Ware 1,2, Joshua Stein 1, Apurva Narechania 1, Shiran Pasternak 1, Linda McMahan 1, Chengzhi Liang.
Transposable Elements (TE) in genomic sequence Mina Rho.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
I. Introduction and Red Line Education for Data-unlimited Science.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
MaizeGDB: A Very Short Overview of a Database Resource for Biological Information on Zea mays Jack M. Gardiner ASPB 2010.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Phenotype Curation Susan R. McCouch Department of Plant Breeding Cornell University.
Center for Integrated Fungal Research
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Genomics Chapter 18.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
What is BLAST? Basic BLAST search What is BLAST?
Introduction to Genes and Genomes with Ensembl
VectorBase genome annotation
Gramene Technical Improvements
Basics of BLAST Basic BLAST Search - What is BLAST?
Overview of the Encyclopedia of Life (EOL) Project
The Celera Genome Browser: A Tool for Visualizing and Annotating the Human Genome
Gene Annotation with DNA Subway
Genome Annotation w/ MAKER
with the Ensembl Genome Browser
Welcome to the Markers Database Tutorial
Follow-up from last night: XSEDE credits
Presentation transcript:

Annotation and Visualization Doreen Ware

Project Challenges Rapidly growing sequence data Full annotation of all clones New high-performance computing cluster 2,000 nodes Scheduling system (SunGrid Engine) NFS issues EnsEMBL Code Integration

Milestones released Customized entry points of the Ensembl browser for the maize community. Adapted modules to the new compute cluster Blue Helix and automated gene predictions, MDR analysis, repeat masker Alignments of cereal sequence using Gramene Biopipe (needs to be automated) Transitioned from annotating Finished BACs to all BACs as they become available Blast Server FTP site DAS server (displaying Twinscan annotations)

Index Page

Maizesequence.org RSS BAC Notification Users can be notified of sequence and annotation updates to a particular region of interest on the FPC map via a RSS (Really Simple Syndication) notification system. Data is delivered as XML to the users favorite feed reader or is parsed in RSS enabled browsers. The URL for any given query is persistent and dynamically retrieves database updates in the user-specified region. …

Maizesequence.org FTP and Blast DB Ensembl BAC DB Weekly Bulk Genome Dump Maize FTP BACs BAC Contigs Ab initio predictions Ab initio translations Maize Blast BAC Contigs Ab initio predictions Ab initio translations BACs, BAC Contigs, FgenesH predictions (TE and non-TE classes), and FgenesH translations are dumped on a weekly basis. Sequence dumps are posted to the FTP site. (ftp.maizesequence.org)ftp.maizesequence.org Sequence dumps are also used to update the blast databases. (

MapView

CytoView

ContigView

GeneView

ExportView

Maize Databases and Annotation Pipeline

Classification of Gene Models Ab initio gene prediction on non-masked contigs with FGENESH using Monocot parameters. Classified gene models by BLASTP to Genbank NRAA. TE = Alignment to transposable elements (TE), as specified within curated database. NH = No detectable homology. WH = Significant alignment to non-TE. Corrupted_translation = Ensembl translation inconsistent with FGENESH. Gene Model ClassMinimumMaximumAverageMedian Standard Deviation TE size (bases)5123,9132,7392,4021,916 WH size (bases)7325,8162,4651,8292,146 NH size (bases)319, Corrupted_translation (bases)825,8692,2511,8451,813 Data generated as of September 2007

Nucleotide Coverage of Mathematically-Defined Repats in 10,352 Annotated BACs (130,978 Contigs) MDR Type*Total NucleotidesNucleotide Coverage 2 copies 1,325,811, % 10 copies 937,789, % 100 copies 602,350, % 1000 copies 218,650, % *Mathematically defined r epeats indicate regions of repetitive DNA. The frequency of each constituent 20-mer along the BAC sequence was determined within the raw reads of the maize whole genome shotgun sequence (DOE Joint Genome Institute). MDR type 2 copies indicates regions over which 20-mers occurred two or more times. Thus, MDR type 10 copies, MDR type 100 copies, and MDR type 1000 copies indicate; respectively, regions over which 20-mers occurred, ten or more times, one hundred or more times, and one thousand or more times. The most repetitive regions correspond to regions in the MDR type 1000 copies. The least repetitive regions correspond to areas in the MDR type 2 copies. Data generated as of September 2007

Nucleotide Coverage of Repeats in 10,352 Annotated BACs (130,978 Contigs) Repeat Type*Total NucleotidesNucleotide Coverage MIPS/REcat Class I Retroelements1,503,929, % MIPS/REcat Class II/III Transposable Elements36,620, % MIPS/Recat Other16,048, % All Repeats1,553,118, % *Repetitive sequences were annotated and masked using RepeatMasker and the MIPS-Redat library. Data generated as of September 2007

Outreach and Collaborations MaizeGDB EBI EnsEMBL Gramene Maize Array Working Group Maize Optical Map Transposon Annotation TWINSCAN Vmatch Student Annotation (Howard Hughes)

Objectives for Year 3 Whole Genome Alignments for rice, maize and arabidopsis Evidence based gene builds Gramene modified Ensembl pipeline and FGENESH++ in combiner mode BioMart for complex query generation Whole Genome Alignments for rice, maize and arabidopsis SyntenyView based on whole genome alignment Transition from Gramene Biopipe -> Ensembl Exonerate pipeline to automate sequence alignments Annotation of non-coding RNA using tRNAScan and microRNA Gene Ontology using dbxref pipeline Incorporation in Gramene Compara builds; GeneTree view MySQL Database dumps Tutorials for website using Camptasia Submit paper on MDR analysis Shiran Pasternak, Apurva Narechania, Linda McMahan, Joshua Stein