Presentation is loading. Please wait.

Presentation is loading. Please wait.

BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.

Similar presentations


Presentation on theme: "BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase."— Presentation transcript:

1 BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase

2 BRC6 28 th October 2008 Arthropod vectors of human pathogens Lutzomyia Phlebotomus Culex Rhodnius Anopheles Glossina Aedes PediculusIxodes

3 BRC6 28 th October 2008 Deer tick Ixodes scapularis Vector of Lyme disease (spirochete Borrelia burgdorferi) Estimated genome size of 2.1 Gb Sequenced strain: Wikel 12th generation from ticks sourced from New York, Oklahoma & Connecticut First Chelicerate genome to be sequenced

4 BRC6 28 th October 2008 Genome annotation cycle Automatic gene build Assembly Community annotations Manual annotations Other genomes, gene sets Repeat library (TEs etc) ESTs, cDNAs Protein domains

5 BRC6 28 th October 2008 Generating sequence Sequencing undertaken by established sequencing centres (e.g. Broad, JCVI,) Initial assembly annotated in collaboration with the sequencing centre(s) 19,300,000 trace reads generated Approx. 6x WGS 570K BAC end sequencing Assembly produced at JCVI 194K EST sequences

6 BRC6 28 th October 2008 Assembly statistics This WGS project has the project accession ABJB000000000. The current version of the project (01) has the accession number ABJB010000000, and consists of 1,141,594 scaffolds (ABJB010000001- ABJB011141594). ABJB000000000ABJB010000001- ABJB011141594 Released assembly IscaW1 570,637 contigs 369,495 supercontigs Assembled coverage of 3.8x

7 BRC6 28 th October 2008 Preparing for gene build Repeatmasking Analyses to identify repeat elements RepeatScout RECON Standard tandem-repeat & low-complexity filtering Collate data sets Transcripts (cDNA & EST data) Peptides (taxonomic groupings, inc. Daphnia pulex) Train gene predictors, mainly Augustus (JCVI)

8 BRC6 28 th October 2008 Annotation plan First-pass gene prediction Focused on protein-coding genes CDS’s Semi-automated approach This is not manual curation Involvement of community where possible Timely delivery of gene set

9 BRC6 28 th October 2008 Gene Prediction Each group/centre has it’s own gene prediction pipeline/protocol. Each group produces a 1st pass ‘best guess’ set of predictions 0.5 sets, public release These sets are merged into a single set 1.0 set, not released Quality control activities 1.1.set, public release Which is annotated with protein features.. And released to the wider world

10 BRC6 28 th October 2008 Merging gene predictions Reduce to single predictions per locus Compare exon/intron structures Gene set #1Gene set #2 Identical structures Compatible structures Different structures Merge/Split structures ComplexNo Map Add isoform predictions based on EST/Peptide data Canonical gene set

11 BRC6 28 th October 2008 Merge of data sets to 1.0 release Simple, hierarchical system Reduce to single transcript per locus (simplicity) Compare loci across the 2 sets Categorize Manually investigate some examples Deal with each category individually Collate each group back to give a ‘minimal’ complete set Add alternate isoforms back into the set (transcripts, proteins) Add UTR extensions where possible QC the data set

12 BRC6 28 th October 2008 Merge annotation comparisons

13 BRC6 28 th October 2008 Examples Isoform-compat Isoform-diff

14 BRC6 28 th October 2008 Examples Merge/Splits Difficult

15 BRC6 28 th October 2008 GBrowse viewer

16 BRC6 28 th October 2008 VectorBase browser

17 BRC6 28 th October 2008 Final gene set (IscaW1.1) 20,486 protein-coding genes 48% have Pfam domain 40% have supporting EST evidence 8,138 tRNAs Over-prediction of Ser (4425) and Thr (1527) predictions 301 ncRNA Submitted to GenBank last week, release to be coordinated in the next couple of weeks

18 BRC6 28 th October 2008 Genome annotation cycle Automatic gene build Assembly Community annotations Manual annotations Other genomes, gene sets Repeat library (TEs etc) ESTs, cDNAs Protein domains

19 BRC6 28 th October 2008 Community annotation Web submission CHADO Researcher Community representative Appraisal Approval GFF3 Gene Build vb ! Total: 13,339 entries An. gambiae 9,423 Cx. quinquefasciatus 2,598 Ae. aegypti 1,281 Ix. scapularis 37

20 BRC6 28 th October 2008 Community annotation track in browser

21 BRC6 28 th October 2008 Lessons Annotation plan for sequencing and annotation of new genomes is well established (MSC & BRC) Clearly defining the data release strategies (0.5,1.0 & 1.1) Monthly conference calls Face to face meeting when merging 0.5 gene predictions Coordinated release between MSC, VectorBase and GenBank

22 BRC6 28 th October 2008 But we can always improve Agreement on project/public identifiers at the start of the project Primarily contigs and supercontigs Overall nomenclature applied as final step in annotation More QC before the major milestones Better communication

23 BRC6 28 th October 2008 Acknowledgements Kitsos Louis Pantelis Topalis Emmanuel Dialynas Ewan Birney Martin Hammond Daniel Lawson Karyn Megy Bill Gelbart Kathy Campbell Fotis Kafatos George Christophides Bob MacCallum Seth Redmond Peter Atkinson Peter Arensburger Catherine Hill Jason Meyer Frank Collins Greg Madey Scott Emrich Ryan Butler Katie Cybulski Nate Konopinski Rob Bruggner (alumni) E.O. Stinson (alumni) Dave Severson Neil Lobo Frank Collins Neil Lobo AedesAnophelesCulexIxodes EMBL-EBIHarvardIMBBImperialNotre Dame Colleagues Ensembl { Genebuilders, Web, Compara, Core, Outreach } BRCs { Pathema, ApiDB } Sequencers { JCVI & Broad Institute }


Download ppt "BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase."

Similar presentations


Ads by Google