Presentation is loading. Please wait.

Presentation is loading. Please wait.

VectorBase BRC4 20061 The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,

Similar presentations

Presentation on theme: "VectorBase BRC4 20061 The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,"— Presentation transcript:

1 VectorBase BRC4 20061 The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK

2 VectorBase BRC4 20062 VectorBase species Generic GeneBuild (new genomes) VectorBase GeneBuild (new developments) Influence of manual annotation Progress in manual annotation Partial GeneBuilds Points to cover

3 VectorBase BRC4 20063 Aedes aegypti Anopheles gambiae PEST Annotated Ixodes scapularis Sequencing Culex pipiens quinquefasciatus Assembly Anopheles gambiae M & S form Pediculus humanus Sequencing Glossina morsitans morsitans Lutzomyia longipalpis Phlebotomus papatasi Rhodnius prolixus Initiated

4 VectorBase BRC4 20064 Annotation of new genomes Assembled genome VectorBase gene predictionsSequencing centre gene predictions Merge into canonical set Protein analysis Display on genome browser Release to GenBank/EMBL/DDBJ

5 VectorBase BRC4 20065 VectorBase gene prediction pipeline Blessed predictions Community submissionsManual annotations Species-specific predictions Similarity predictions Transcript based predictions Ab initio gene predictions Canonical predictions (Genewise) (SNAP) (Exonerate) (Apollo) (Genewise, Exonerate, Apollo) Protein family HMMs (Genewise) ncRNA predictions (Rfam)

6 VectorBase BRC4 20066 VectorBase curation database pipeline for manual/community annotation Curation warehouse db Manual annotation (Harvard) Apollo Community annotation (Community representatives) Chado-XML Chado Ensembl GFF3 Gene build db Community annotation (in collaboration with Harvard)

7 VectorBase BRC4 20067 Manual annotation progress Protein-coding gene No. VectorBase manual Community submission Anopheles gambiae AgamP3.313,277261 ( 2.0 %)667 ( 5.0 %) current2474 (18.6 %)667* ( 5.0 %) Aedes aegypti AaegL1.115,4190 ( 0.0 %) current0 ( 0.0 %)341 ( 2.2 %)

8 VectorBase BRC4 20068 Manual annotation visualisation

9 VectorBase BRC4 20069 Overview of proposed re-annotation system Blessed genes Current gene set Compare Species-specific gene prediction New gene build Merge Updated gene set Full gene build Partial Gene build

10 VectorBase BRC4 200610 Comparing new gene builds with the old one Use of manual annotation for validation of automated gene build improvements Simple statistics (CDS length, intron size, CDS matching TE’s) BRC annotation metrics –Supporting evidence for a gene prediction (citation, expression, orthology) –Attachment of Standard Operating Procedures (SOPs)

11 VectorBase BRC4 200611 VectorBase gene prediction pipeline (SOP) Blessed predictions Community submissionsManual annotations Species-specific predictions Similarity predictions Transcript based predictions Ab initio gene predictions Canonical Gene set VB:SOP0001 VB:SOP0002 & SOP0003 VB:SOP0005 VB:SOP004 Protein family HMMs VB:SOP0009 ncRNA predictions VB:SOP0008 VB:SOP0007

12 VectorBase BRC4 200612 Gene build schedules Full gene build Partial gene build 4 months 1 month Triggers for re-annotation Temporal Data New data for species New genomes Re-annotated genomes

13 VectorBase BRC4 200613 3 wise annotators

14 VectorBase BRC4 200614

15 VectorBase BRC4 200615 Merging gene sets Reduce to single predictions per locus Compare exon/intron structures Gene set #1Gene set #2 Identical structures Compatible structures Different structures Merge/Split structures ComplexNo Map Add isoform predictions based on EST/Peptide data Canonical gene set

Download ppt "VectorBase BRC4 20061 The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,"

Similar presentations

Ads by Google