Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) www.yandell-lab.org iPlant: Josh Stein (CSHL) Matt Vaughn.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Daniel Ence Yandell Lab University of Utah.  Annotations are descriptions of features of the genome  Structural: exons, introns, UTRs, splice forms.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
How to access genomic information using Ensembl August 2005.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
TAIR resources for plant biology research kate dreher curator TAIR/PMN.
Assembly & Annotation at iPlant
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Manifestations of a Code Genes, genomes, bioinformatics and cyberspace – and the promise they hold for biology education.
New data and tools at TAIR (The Arabidopsis Information Resource)
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
I. Introduction and Red Line Education for Data-unlimited Science.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation Rosana O. Babu.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
Mark D. Adams Dept. of Genetics 9/10/04
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
bacteria and eukaryotes
Annotating The data.
Introduction to Genes and Genomes with Ensembl
VectorBase genome annotation
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Functional Annotation of the Horse Genome
PlantGDB: Annotation Principles & Procedures
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genome Annotation w/ MAKER
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Part II SeqViewer AraCyc Help
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn (TACC) Dian Jiao (TACC) Zhenyuan Lu (CSHL) Nirav Merchant (U. Arizona) Carson Holt (Ontario Institute Cancer Research) Campbell et al. Plant Physiology. DOI: /pp Cantarel et al Genome Research 18:188 Holt & Yandell BMC Bioinformatics 12:491

Assembly & Annotation at iPlant

What Are Annotations? Annotations are descriptions of features of the genome Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes Expression, repeats, transposons Annotations should include evidence trail Assists in quality control of genome annotations Examples of evidence supporting a structural annotation: Ab initio gene predictions ESTs Protein homology

Secondary Annotation Protein Domains InterPro Scan: combines many HMM databases GO and other ontologies Pathway mapping E.g. BioCyc Pathway tools

Challenges in Plant Genome Annotation Genomes are BIG Highly repetitive Many pseudogenes Assembly contamination Incomplete evidence No method is 100% accurate

Options for Protein-coding Gene Annotation Yandell & Ence. Nature Reviews Genetics 13, (May 2012) | doi: /nrg3174

Typical Annotation Pipeline Contamination screening Repeat/TE masking Ab initio prediction Evidence alignment (cDNA, EST, RNA-seq, protein) Evidence-driven prediction Chooser/combiner Evaluation/filtering Manual curation

MAKER-P Automated Pipeline Ab initio prediction Evidence MPI-enabled to allow parallel operation on large compute clusters

Quality Control evaluation of the MAKER-P and TAIR10 datasets using Annotation Edit Distance (AED). Better Quality Worse

W559 - Annotation of the Lobolly Pine Megagenome—Jill Wegrzyn Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours P157 - Disease Resistance Gene Analysis on Chromosome 11 Across Ten Oryza Species 10 rice species (each w/12 chromosome pseudomolecules) 96 CPU per chromosome (1152 CPU total) ~ 2hr per genome 10 22,656 CPU cores on1,888 nodes GenomeAssembly Size (Mb) CPU Run Time Arabidopsis thalianaTAIR :44 Arabidopsis thalianaTAIR :27 Zea maysRefGen_v :53 TACC Lonestar Supercomputer Campbell et al. Plant Physiology. December 4, 2013, DOI: /pp PAG 2014: MAKER-P at iPlant

Virtual image MPI-enabled for parallel computing Check out with up to 16 CPU Tested with 4 CPU instance (rice chr 1 in 8 hr 45 min) Tutorial: sciplant/MAKER-P+Tutorial+%28Atmosphere%29 11 Atmosphere: MAKER_2.28 (emi-F13821D0)

Future Plans TACC Lonestar Supercomputer Discovery Environment GUI