Download presentation
Presentation is loading. Please wait.
Published byGerard Oliver Modified over 8 years ago
1
Pfam, DAS and the future Rob Finn DAS Workshop 2009
2
What is Pfam? –Protein families/domain database Complete and accurate classification of protein space Each family represented by alignments and profile HMMs –Two Distinct Parts Pfam-A - high quality, curated, annotation Pfam-B - low quality, automated, unannotated –Additional Features Active site, coiled-coils, low complexity, transmembrane regions
3
Sequence Features Client
4
Motivation –Include Other annotations Identify where we are missing domains –Reduce data duplication –Enrich single protein data in Pfam –Allow tailored views
5
Updates from DAS registry Tailored Features Views
6
Features Request List of sources
7
DAS Alignments The Next Step…. –Multiple Sequence Alignments –PREFIX/das/alignment?query=ID DAS Client DAS Alignment Server
8
Dealing with large alignments –PREFIX/das/alignment?query=ID[&subject=ID[RANGE]] or/and [&rows=START-END} DAS Client X DAS Alignment Server DAS Alignments
9
Dealing with large alignments –PREFIX/das/alignment?query=ID[&rows=START-END] DAS Client DAS Alignment Server DAS Align Feature Server DAS Alignments
10
In Practice –Pfam alignments vary in size 2 - 80,000+ sequences Paging Essential –Simple DAS alignment client HTML, AJAX Pfam Alignments
11
Future Directions More alignment sources are on their way! –Develop standalone, generic application –Paging replaced for ‘Live Grid’ Issues –Genomics alignments! –Layering on features
12
HMMER3 Faster and more sensitive version of underlying software –Make use of new features? Query Length Pfam (140 X 11000) 200.02 4000.41 3500035.93 Real time DAS searches!
13
Hot Alignments Can we scale efficiently?
14
Bringing in other datasets Pfam –NCBI NR (genPept) –Metagenomics COSMIC - Catalogue Of Somatic Mutations In Cancer
15
COSMIC Data Sources Advantages Prolong life of data Maintain integrity Genes continually updated Scientist explore data Ability to combine data sets Features Manual Curation Map reference sequence Standards Mutation naming Tumour sample Phenotype Scientific LiteratureCancer Genome Project Systematic Screens COSMIC
16
COSMIC/Pfam/Uniprot Prototyped on 60 ‘classic’ Proteins Automated update when COSMIC or Uniprot released
17
Linking COSMIC/Pfam/Spice Linking and State Maintenance
18
Acknowledgements Pfam –Prasad Gunasekaran –John Tate –Alex Bateman –Penny Coggill –Jaina Mistry COSMIC –Jon Teague –Cosmic team…… Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.