Download presentation
Presentation is loading. Please wait.
Published byLeonard Cain Modified over 9 years ago
1
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) www.yandell-lab.org iPlant: Josh Stein (CSHL) Matt Vaughn (TACC) Dian Jiao (TACC) Zhenyuan Lu (CSHL) Nirav Merchant (U. Arizona) Carson Holt (Ontario Institute Cancer Research) Cantarel et al. 2008. Genome Research 18:188 Holt & Yandell. 2011. BMC Bioinformatics 12:491
2
What Are Annotations? Annotations are descriptions of features of the genome Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes Functional: enzymatic activity, expression Annotations should include evidence trail Assists in quality control of genome annotations Examples of evidence supporting a structural annotation: Ab initio gene predictions ESTs Protein homology
3
Secondary Annotation Protein Domains InterPro Scan: combines many HMM databases GO and other ontologies Pathway mapping E.g. BioCyc Pathway tools
4
Challenges in Plant Genome Annotation Genomes are BIG Highly repetitive Many pseudogenes Yet it is important to get it right!
5
Contamination Issue
6
Annotation Error Example: split gene models
7
Typical Annotation Pipeline Contamination screening Repeat/TE masking Ab initio prediction Evidence alignment (cDNA, EST, RNA-seq, protein) Evidence-based prediction Combiner Evaluation/filtering Manual curation
8
Options for Protein-coding Gene Annotation
9
MAKER is an easy-to-use annotation pipeline designed to help smaller research groups convert the mountain of genomic data provided by next generation sequencing technologies into a usable resource.
10
MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions, automatically synthesizes these data into gene annotations, and produces evidence-based quality values for downstream annotation management
11
Quality Control evaluation of the MAKER-P and TAIR10 datasets using Annotation Edit Distance (AED). Better Quality Worse
12
MAKER-P MPI Support Message Passing Interface (MPI) is a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.
13
Current evidence Current Assembly Annotating the Genome – Apollo View
14
Current evidence Current Assembly Identify and Mask Repetitive Elements
15
Current evidence Current Assembly Identify and Mask Repetitive Elements RepeatMasker –RepBase –Species specific library RepeatRunner –MAKER internal protein library
16
Current evidence Current Assembly Identify and Mask Repetitive Elements
17
Current evidence Current Assembly Ab initio Predictions Generate Ab Initio Gene Predictions
18
Current evidence Current Assembly Ab initio Predictions Generate Ab Initio Gene Predictions MAKER currently supports: – SNAP – Augustus – GeneMark – FGENESH Can be run internally or externally
19
Current evidence Current Assembly Ab initio Predictions Generate Ab Initio Gene Predictions
20
Current evidence Current Assembly Ab initio Predictions Align EST and Protein Evidence EST TBLASTX EST BLASTN Protein BLASTX
21
Current evidence Current Assembly Ab initio Predictions Align EST and Protein Evidence EST TBLASTX EST BLASTN Protein BLASTX Identify regions being actively transcribed (i.e. EST data) Identify region with homology to a known protein
22
Current evidence Current Assembly Ab initio Predictions Align EST and Protein Evidence EST TBLASTX EST BLASTN Protein BLASTX
23
Polish BLAST Alignments with Exonerate Current evidence Current Assembly Ab initio Predictions Polished protein Polished EST
24
Polish BLAST Alignments with Exonerate Current evidence Current Assembly Ab initio Predictions Polished protein Polished EST All base pairs must aligns in order. No HSP overlap is permitted Aligns HSPs correctly with respect to splice sites.
25
Polish BLAST Alignments with Exonerate Current evidence Current Assembly Ab initio Predictions Polished protein Polished EST
26
Current evidence Current Assembly Ab initio Predictions Hint-based SNAP Hint-based FgenesH Pass Gene Finders Evidence-based ‘hints’
27
Current evidence Current Assembly Ab initio Predictions Hint-based SNAP Hint-based FgenesH * * Quantitative Measures for the Management and Comparison of Annotated Genomes Karen Eilbeck, Barry Moore, Carson Holt and Mark Yandell BMC Bioinformatics 2009 10:67doi:10.1186/1471-2105-10-67 Identify Gene Model Most Consistent with Evidence*
28
Current evidence Current Assembly Ab initio Predictions * Revise it further if necessary; Create New Annotation
29
Compute Support for Each Portion of Gene Model
30
MAKER-P v2.28 at iPlant TACC Lonestar Supercomputer with 22,656 CPU MPI enabled for parallel computation Can complete entire rice genome in ~2 hrs (1,152 cores) 96 CPU per chromosome Can complete Aegilops tauschii ALLPATHS-LG assembly in ~8 hrs (1,152 cores) Currently being integrated into the iPlant Discovery Environment Atmosphere MPI enabled for parallel computation Maximum instance size 16 CPU
31
Assembly & Annotation at iPlant
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.