Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pfam, DAS and the future Rob Finn DAS Workshop 2009.

Similar presentations

Presentation on theme: "Pfam, DAS and the future Rob Finn DAS Workshop 2009."— Presentation transcript:

1 Pfam, DAS and the future Rob Finn DAS Workshop 2009

2 What is Pfam? –Protein families/domain database Complete and accurate classification of protein space Each family represented by alignments and profile HMMs –Two Distinct Parts Pfam-A - high quality, curated, annotation Pfam-B - low quality, automated, unannotated –Additional Features Active site, coiled-coils, low complexity, transmembrane regions

3 Sequence Features Client

4 Motivation –Include Other annotations Identify where we are missing domains –Reduce data duplication –Enrich single protein data in Pfam –Allow tailored views

5 Updates from DAS registry Tailored Features Views

6 Features Request List of sources

7 DAS Alignments The Next Step…. –Multiple Sequence Alignments –PREFIX/das/alignment?query=ID DAS Client DAS Alignment Server

8 Dealing with large alignments –PREFIX/das/alignment?query=ID[&subject=ID[RANGE]] or/and [&rows=START-END} DAS Client X DAS Alignment Server DAS Alignments

9 Dealing with large alignments –PREFIX/das/alignment?query=ID[&rows=START-END] DAS Client DAS Alignment Server DAS Align Feature Server DAS Alignments

10 In Practice –Pfam alignments vary in size 2 - 80,000+ sequences Paging Essential –Simple DAS alignment client HTML, AJAX Pfam Alignments

11 Future Directions More alignment sources are on their way! –Develop standalone, generic application –Paging replaced for ‘Live Grid’ Issues –Genomics alignments! –Layering on features

12 HMMER3 Faster and more sensitive version of underlying software –Make use of new features? Query Length Pfam (140 X 11000) 200.02 4000.41 3500035.93 Real time DAS searches!

13 Hot Alignments Can we scale efficiently?

14 Bringing in other datasets Pfam –NCBI NR (genPept) –Metagenomics COSMIC - Catalogue Of Somatic Mutations In Cancer

15 COSMIC Data Sources Advantages Prolong life of data Maintain integrity Genes continually updated Scientist explore data Ability to combine data sets Features Manual Curation Map reference sequence Standards Mutation naming Tumour sample Phenotype Scientific LiteratureCancer Genome Project Systematic Screens COSMIC

16 COSMIC/Pfam/Uniprot Prototyped on 60 ‘classic’ Proteins Automated update when COSMIC or Uniprot released

17 Linking COSMIC/Pfam/Spice Linking and State Maintenance

18 Acknowledgements Pfam –Prasad Gunasekaran –John Tate –Alex Bateman –Penny Coggill –Jaina Mistry COSMIC –Jon Teague –Cosmic team…… Questions?


Download ppt "Pfam, DAS and the future Rob Finn DAS Workshop 2009."

Similar presentations

Ads by Google