Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisory Board Meeting, CSHL 2005 Developments at Sanger Anthony Rogers Wellcome Trust Sanger Institute.

Similar presentations


Presentation on theme: "Advisory Board Meeting, CSHL 2005 Developments at Sanger Anthony Rogers Wellcome Trust Sanger Institute."— Presentation transcript:

1 Advisory Board Meeting, CSHL 2005 Developments at Sanger Anthony Rogers Wellcome Trust Sanger Institute.

2 Advisory Board Meeting, CSHL 2005 Overview The build procedure Stats for the year Team changes Model changes. “new gene model” Variation Future plans InterPro improved mapping of data to genes move off wormsrv2 new nematodes new data types

3 Advisory Board Meeting, CSHL 2005 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory Washington University in St. Louis California Institute of Technology ● RNAi ● Microarray ● Anatomy / Cell ● Homology groups (KOGS) ● SAGE data ● Gene Ontology ● Papers / References ● Person / Author ● Detailed Functional Annotation ● Gene prediction annotation ● SNPs ● PCR_products / Oligos ● 3D structures ● Yeast 2 Hybrid interactions Website and tools Gene prediction annotation Genetic Data Alleles Gene name info ( incl unique ids ) Strains Data Integration and analysis The WormBase Consortium

4 Advisory Board Meeting, CSHL 2005 Build Overview To FTP site and CSHL Dev site CalTechSangerCSHLWashU WormBase EMBL Align all cDNAs and build transcripts Map expt data eg RNAi, oligos, Alleles mysql WORMPEP DNA Sanger Compute Farm Blastx, blastp, RepeatMask PFAM, tmhmm etc Load homology data Export GFF, agp, DNA files. Build release files

5 Advisory Board Meeting, CSHL 2005 Release cycle From WS124 (March 2004) – WS150 (October 2005) - 26 releases. All but 2 of these were on schedule. Those that were late were due to Sanger wide systems problems associated with moving to new building. After W134 changed (with SAB approval) to three weekly cycle. If releases on time - Why? Increases in data meant gradual increase in time. Lots of releases were “Just in time” Time pressure meant that fixes weren’t been made properly. Reduced staff meant that less development was being done.

6 Advisory Board Meeting, CSHL 2005 Gene stats More polyA / TSL etc and fixing BLAT errors

7 Advisory Board Meeting, CSHL 2005 Experimental Data Stats I New data class

8 Advisory Board Meeting, CSHL 2005 Experimental Data Stats II Incorporation of genome wide experiments

9 Advisory Board Meeting, CSHL 2005 Other classes of interest InParanoid

10 Advisory Board Meeting, CSHL 2005 Staff Changes Mary Ann Tuli Gary Williams Great improvement in documentation of procedures. Gene structure curation Allele curation genetic map functions in acedb Sequence feature annotation ( polyA, TSL) Fresh view of methods for doing things. Keith Bradnam Choa-Kung Chen Dan Lawson Michael Han

11 “Where is the new Gene model Keith!?!”

12 Advisory Board Meeting, CSHL 2005 The problem ≈ Worm genes first existed as Locus objects ≈ e.g. dpy-1 ≈ Then genes existed as Sequence objects ≈ e.g. F31D4.3 ≈ Some genes exist as both Locus and Sequence objects ≈ Gene names change…a lot!

13 Advisory Board Meeting, CSHL 2005 LocusSequence C09D8.1 ptp-3 ptp-1 ypp-1 YPP/1 C09D8.1a C09D8.1b ptp-3aptp-3b Gene WBGene0000001 Other names Main CGC name Sequence name CDS ptp-1 The Plan

14 Advisory Board Meeting, CSHL 2005 Linking to a gene Paper [cgc4265]AntibodyAllele C09D8.1 ptp-3 ptp-1 ypp-1 YPP/1 C09D8.1a C09D8.1b ptp-3aptp-3b Gene WBGene0000001 C09D8.1c abc-1 RNAi result

15 Advisory Board Meeting, CSHL 2005 Progress! The (no longer new) Gene model is in place. All Genes now have Gene_ids Gene history tracking info stored merges, splits etc Next part of the plan was to have a central database serving ids

16 Advisory Board Meeting, CSHL 2005 Working version Sanger “single sign-on” User specific operations Operation selection Not just WBGene_ids - Variation, RNAi, Person

17 Advisory Board Meeting, CSHL 2005 Variation Model Locus SNPs Classical Genes Gene Clusters Allele Deletions Transposon_insertions Lots of shared data structures (Tags) eg Mapping data, Names, connections to CDSs Variation Greater code efficiency and managability for both build and web Easier to search

18 Advisory Board Meeting, CSHL 2005 Imminent arrivals and the Future InterPro Refined Mapping Moving build machine New nematodes New data types

19 Advisory Board Meeting, CSHL 2005 InterPro Useful data used in many other resources so a good ‘point of reference for non-worm specialists. We previously got ours from UniProt or ad hoc from St Louis. Many databases are covered by InterPro. Prosite, Prints, Pfam, SMART, PIRSF, etc. Usual way of searching for database hits is to use interproscan, but this is incompatible with Sanger farm. Run each database search individually using existing architecture from BLAST etc and stores the results. We merge hits with the same InterPro ID

20 Advisory Board Meeting, CSHL 2005 Merging hits from databases Protein Results similar but not identical to iprscan

21 Advisory Board Meeting, CSHL 2005 InterPro hits per protein 15 Proteins with >100 domains (max. 186)

22 Advisory Board Meeting, CSHL 2005 Improved Mapping of Variations to Genes We can describe much more accurately how a mutation affects a gene.. - donor and acceptor splice sites - introns / exons - motifs like polyAs and TSLs... and for coding changes give the amino acid differences. Variations

23 Advisory Board Meeting, CSHL 2005 sra-9 ttc tta F L Currently only connection to Gene Future will specify that the SNP is in coding sequence and that it causes a specified amino acid change. Described by Tags in the database, so searchable. Predicted snp_AH6[1]

24 Advisory Board Meeting, CSHL 2005 Implementation x One table per chromosome, so all can be loaded together GFF data exons, introns, transcripts, SNPS, alleles etc I II IIIIV V X All chromosomes can be run in parallel cbi1 = 3 x 2cpu

25 Advisory Board Meeting, CSHL 2005 Death of wormsrv2 5 years ago Sanger network = bad Bought shiney fast new computer Become too slow and isolation is a pain Now Sanger network = Good ! Move to use informatics cluster - fast and parallel Means modification of majority of code base

26 Advisory Board Meeting, CSHL 2005 New nematodes New nematode genomes C. briggsae is a forerunner... semi-curated geneset brigpep2 protein annotation ( PFAM, tmhmm, signalp ) ortholog assignment ( InParanoid - Erich Sonnhammer ) blastp blastx waba ( Jim Kent’s genome alignment tool ) We intend to do all of this for each of the new genomes. Mostly done for C.remanei

27 Advisory Board Meeting, CSHL 2005 New Data Types Any new data types impact on build new model development scripts to integrate and check the data Eg Mass spec data: Been in contact with Gennifer Merrihew


Download ppt "Advisory Board Meeting, CSHL 2005 Developments at Sanger Anthony Rogers Wellcome Trust Sanger Institute."

Similar presentations


Ads by Google