Download presentation
Presentation is loading. Please wait.
Published byAdam Snow Modified over 8 years ago
1
Mapping of Scientific Workflow within the e-Protein project to Distributed Resources London e-Science Centre Department of Computing, Imperial College London
2
2 Introduction Proteins Background to e-Protein project Project Workflow ICENI Demo
3
3 What are Proteins ?
4
4 Proteins Proteins are necklaces of smaller subunits called amino acids. Basic of how biology gets things done: -Give structure to our hair, skin, bones -Act as hormones and enzymes -Act as antibodies in support of the immune system. e.g.Gastrin -> Stomach->causes HCL production For this reason, scientists have sequenced the human genome - DNA code which specifies the sequence of amino acids along the protein “necklace”.
5
5 Proteins Knowing the sequence tells us little about what the protein does. To carry out their function proteins must “fold” and form the correct structure. Incorrect folding can lead to diseases such as Alzheimers and Cancer. By studying protein structure we can better understand the nature of disease and design more effective drugs.
6
6 Proteins Genome projects have yielded huge amounts of protein sequences. To understand protein structure huge amounts of computing power and expertise is needed. No single UK group can achieve this single- handed Sharing of computing resources across different sites provides an obvious solution
7
7 e-Protein Funded by the BBSRC through their e-Science programme Objectives are to provide a structure-based annotation of the proteins in the major genomes by linking resources at three sites –EBI, Imperial, UCL. At each of the three sites there is a local database providing the local contribution to the protein annotation. Different strategies are used at each site to assign protein structures to the sequences. Comparison of the results to identify problems
8
8 e-Protein
9
9 At Imperial there is a MySQL relational database called 3D-GENOMICS Pipeline focuses on proteins for which several steps of anaylsis using various applications are performed - Identification of TM regions - Coiled coils - Prosite-patterns - Secondary structure prediction Structural information is assigned via homology (using BLAST, PSI-BLAST)
10
10 3D-Genomics
11
11 ICENI Capture of this workflow and mapping of the components to distributed resources is the priority within the Imperial side of the project ICENI provides mechanism for creating and managing computational grids Uses a component programming model to describe grid applications Allows scientists to define the required workflow within a graphical environment by dragging-and-dropping components
12
12 ICENI Middleware Has a rich meta-data structure Allows current state of the resources to be captured Allows the different library versions and application programs to be represented Defined in an XML schema Two main services provided by ICENI: - Launching Framework - Scheduling Framework
13
13 Launching/Scheduling A schedular is responsible for deciding where components will run A launcher is responsible for starting components on resources Grid container – responsible for starting each of the components Applications within the Imperial College e-protein pipeline are wrapped as binary components Provides the necessary metadata required to schedule and launch the application
14
14 Binary Component Binary executable the component represents JDML file describes: -application execution -Arguments taken Capable of taking a number of input/output data from other components. NB in the e-protein workflow
15
15 Acknowledgements Director: Professor John Darlington Technical Director: Dr Steven Newhouse Research Staff: –Anthony Mayer, Nathalie Furmento –Stephen McGough, William Lee –Kieran Flemming, Oliver Jevons ( kittens and crochet ) Contact: –http://www.lesc.ic.ac.uk/ –e-mail: lesc@ic.ac.uk
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.