e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle University
Outline Computational challenges of bioinformatics Secretion in Bacillus Classification and analysis workflows Results and discussion
Computational Challenges of Bioinformatics New requirements from bioinformatics 3 major problems Heterogeneity Distribution Autonomy Experiments - series of workflows
my Grid and Taverna Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available Freefluo Workflow engine to run workflows Freefluo SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST
Microbase Grid-based system for microbial genome comparison and analysis Information repository (and execution environment) Pre-computed data
Outline Computational challenges of bioinformatics Secretion in Bacillus Classification and analysis workflows Results and discussion
Secretion in Bacillus Predict characteristics & behavior of bacteria Identify secreted proteins Bacillus species diverse behaviour Soil inhabitants Harmful bacteria
Importance of Secretion Mechanism of interaction with environment Reveal capabilities of an organism Pathogens are of great interest
Secretory Proteins Cytoplasm Medium Membrane Cell Wall Signal Peptide Lipoprotein Cell wall binding Transmembrane LPXTG
Outline Computational challenges of bioinformatics Secretion in Bacillus Classification and analysis workflows Results and discussion
Bioinformatic Tools Cytoplasm Medium Membrane Cell Wall Signal Peptide Lipoprotein Cell wall binding Transmembrane LPXTG Signalp TMHMM tmap MEMSAT LipoP ps_scan
Classification Workflow
Process of Analysis Putative secreted proteins Protein families Functional classification Relations
Analysis Workflow
Architecture Custom-designed database Provenance tracking Analysis – computationally intensive Architecture differs from other systems
Web Portal
Outline Computational challenges of bioinformatics Secretion in Bacillus Classification and analysis workflows Results and discussion
Classification Results
Functions of the Clusters Number of families
Biologist’s Outlook Results available for subsequent analysis Data and results are of great interest
eScientist’s Outlook Microbase simplified data analysis But … Autonomy - most services provided originally by external parties Licensing – limits exposure of services Distribution - difficulty came from the relatively large datasets
Future Enhancements Use notification to automatically analyse recently annotated genomes Migrate workflows to a remote enclosed environment?
Acknowledgments Phillip Lord Colin Harwood Anil Wipat my Grid Carole Goble Tom Oinn … and the rest of the my Grid team Microbase Yudong Sun Anil Wipat Matthew Pocock Pete A. Lee Paul Watson Keith Flanagan James T. Worthington