Pfam, DAS and the future Rob Finn DAS Workshop 2009
What is Pfam? –Protein families/domain database Complete and accurate classification of protein space Each family represented by alignments and profile HMMs –Two Distinct Parts Pfam-A - high quality, curated, annotation Pfam-B - low quality, automated, unannotated –Additional Features Active site, coiled-coils, low complexity, transmembrane regions
Sequence Features Client
Motivation –Include Other annotations Identify where we are missing domains –Reduce data duplication –Enrich single protein data in Pfam –Allow tailored views
Updates from DAS registry Tailored Features Views
Features Request List of sources
DAS Alignments The Next Step…. –Multiple Sequence Alignments –PREFIX/das/alignment?query=ID DAS Client DAS Alignment Server
Dealing with large alignments –PREFIX/das/alignment?query=ID[&subject=ID[RANGE]] or/and [&rows=START-END} DAS Client X DAS Alignment Server DAS Alignments
Dealing with large alignments –PREFIX/das/alignment?query=ID[&rows=START-END] DAS Client DAS Alignment Server DAS Align Feature Server DAS Alignments
In Practice –Pfam alignments vary in size ,000+ sequences Paging Essential –Simple DAS alignment client HTML, AJAX Pfam Alignments
Future Directions More alignment sources are on their way! –Develop standalone, generic application –Paging replaced for ‘Live Grid’ Issues –Genomics alignments! –Layering on features
HMMER3 Faster and more sensitive version of underlying software –Make use of new features? Query Length Pfam (140 X 11000) Real time DAS searches!
Hot Alignments Can we scale efficiently?
Bringing in other datasets Pfam –NCBI NR (genPept) –Metagenomics COSMIC - Catalogue Of Somatic Mutations In Cancer
COSMIC Data Sources Advantages Prolong life of data Maintain integrity Genes continually updated Scientist explore data Ability to combine data sets Features Manual Curation Map reference sequence Standards Mutation naming Tumour sample Phenotype Scientific LiteratureCancer Genome Project Systematic Screens COSMIC
COSMIC/Pfam/Uniprot Prototyped on 60 ‘classic’ Proteins Automated update when COSMIC or Uniprot released
Linking COSMIC/Pfam/Spice Linking and State Maintenance
Acknowledgements Pfam –Prasad Gunasekaran –John Tate –Alex Bateman –Penny Coggill –Jaina Mistry COSMIC –Jon Teague –Cosmic team…… Questions?