Download presentation
Presentation is loading. Please wait.
Published byDustin Bryan Modified over 5 years ago
1
Grid computing Assaf Gottlieb Tel-Aviv University
Grid computing Assaf Gottlieb Tel-Aviv University EGEE is a project funded by the European Union under contract IST
2
Outline What is a grid & e-Research The EGEE grid and BioMed VO
Joining the grid EGEE tutorial
3
The Grid Metaphor Mobile Access G R I Supercomputer, PC-Cluster D M L
W A Visualising Workstation Mobile Access Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Internet, networks EGEE tutorial
4
What is Grid computing? (2/2)
The term “Grid” has become popular! Sometimes in Industry : “Grids” = clusters Motivations: better use of resources; scope for commercial services Also used to refer to the harvesting of donated, unused compute cycles Climateprediction.net) These are e-Infrastructure but are not “grids” from the e-Research viewpoint! EGEE tutorial
5
Typical grid Grid middleware runs on each shared resource to provide Data services Computation services Single sign-on Virtual Organisation: People in different organisations seeking to cooperate and share resources across their organisational boundaries Virtual organisations negotiate with sites to agree access to resources INTERNET EGEE tutorial
6
Outline What is a grid & e-Research The EGEE grid and BioMed VO
Joining the grid EGEE tutorial
7
EGEE is … EU-funded project that has established the largest multi-VO production grid in the world Project leader : CERN 245 sites 45 countries >60 VOs 35K CPUs 147PB Storage Avg 30K jobs/day Data transfers >1.5 GB/s Applet at EGEE tutorial
8
Main components Access service How users logon to a Grid
Resource Broker (RB): Service that matches the user’s requirements with the available resources on a Grid Information System: Characteristics and status of resources Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed Storage Element (SE): provides (large-scale) storage for files EGEE tutorial
9
Grid Applications Medical/Healthcare (imaging, diagnosis and treatment ) Bioinformatics (study of the human genome and proteome to understand genetic diseases) Nanotechnology (design of new materials from the molecular scale) Engineering (design optimization, simulation, failure analysis and remote Instrument access and control) Natural Resources and the Environment (weather forecasting, earth observation, modeling and prediction of complex systems) EGEE tutorial
10
The BioMed VO at a glance (1/2)
140 Computing Elements (~20,000 CPUs) 131 Storage Elements (~5 PB disks) in 12 countries EGEE tutorial
11
Data Challenge example: WISDOM-Wide In Silico Docking On Malaria
~300 million people worldwide are affected 1-1.5 million people die every year Widely spread Caused by protozoan parasites of the genus Plasmodium Malaria EGEE tutorial
12
There is a real need for new drugs to fight malaria
Drug resistance has emerged for all classes of antimalarials except artemisinins. Resistance to chloroquine, the cheapest and the most used drug, is spreading in almost all the endemic countries. There is even the threat of resistance to artemisinin too, as it is already observed in murine Plasmodium yoelii The available drugs focus on a limited number of biological targets => cross-resistance to antimalarials There is a consensus that substantial scientific effort is needed to identify new targets for antimalarials EGEE tutorial
13
Identification of new malarial targets
With the advent of the plasmodium genome, many targets came into light The potential antimalarial drug targets are broadly classified into three categories, and each category has many individual targets. Targets involved in human hemoglobin degradation (proteases) Targets involved in parasite metabolism (Folate, phospholipid… ) Targets engaged in parasite membrane transport and signalling (choline carrier etc). WISDOM focused on hemoglobin metabolism and especially on Plasmepsin II and Plasmepsin IV EGEE tutorial
14
Docking Dataflow placing the ligand Scoring ligand data base junk hit
Structure optimization Ranking Scoring crystal structure Protein surface Water molecule Ligand EGEE tutorial
15
Filtering process 50 compounds to be tested in experimental lab
500, 000 chemical compounds Sorting based on scoring in different parameter sets; Consensus scoring 1000 compounds selected Based on key interactions Catalaytic and flap residues 500 compounds Key interactions, binding modes, descriptors, knowledge of active site MD=Molecular Dynamics 100 compounds MD 50 compounds to be tested in experimental lab EGEE tutorial
16
High Throughput Virtual Docking
Millions of chemical compounds available in laboratories High Throughput Screening 1-10$/compound, nearly impossible Molecular docking (FlexX, AutoDock) ~80 CPU years, 1 TB data Chemical compounds (ZINC): Chembridge – 500,000 Drug like – 500,000 Data challenge on EGEE ~6 weeks on ~1700 computers Targets (PDB): Plasmepsin II (1lee,1lf2,1lf3) Plasmepsin IV (1ls5) Hits screening using assays performed on living cells Leads Clinical testing Drug EGEE tutorial
17
Number of Docked Ligands (millions) vs. Time (days)
1 2 3 4 5 6 1: Intensive submission of FlexX jobs with Chembridge ligands base 2: Resubmission 3: Intensive submission of FlexX jobs with drug like ligands base 4: Resubmission 5: Intensive submission of Autodock jobs with Chembridge ligands base 6: Resubmission EGEE tutorial
18
Number of Running and Waiting Jobs vs. Time
3 5 1 2 4 6 EGEE tutorial
19
WISDOM statistics (1/2) Docking conditions Computing statistics
Metrics Value (Total) Value (FlexX) Value (Autodock) Number of targets 11 8 10 Number of parameters 6 4 2 Number of docked ligands 999810 499918 Computing statistics Metrics Value (Total) Value (FlexX) Value (Autodock) Effective duration 37 days 22 days 15 days Effective CPU time 67.15 years 24.77 years 42.38 years Number of Computing Elements 58 56 57 Total number of jobs 72751 41520 31231 Max running jobs at the same time 1758 1643 EGEE tutorial
20
WISDOM statistics (2/2) Data statistics Metrics Value (Total) Value (FlexX) Value (Autodock) Cumulated number of docked ligands (millions) 41.27 31.41 9.87 Total saved data (GB) 946.82 506.05 440.77 Total transferred data (GB) Similar data challenge for H5N1 Neuraminidase (avian flu) is in progress EGEE tutorial
21
WISDOM- Total Amount of CPU Provided by EGEE Federation
South Eastern Europe, 10% South Western Europe, 12% Italy, 16% France, 18% UK, 29% Northern Europe, 7% Central Europe, 4% Asia Pacific, 2% Germany Switzerland, 1% Russia, 1% EGEE tutorial
22
Outline What is a grid & e-Research The EGEE grid and BioMed VO
Joining the grid EGEE tutorial
23
Basic security concepts
Authentication Verify the identity of the peer Authorization Map an entity to some set of privileges Confidentiality Encrypt the message so that only the recipient can understand it Integrity Ensure that the message has not be altered in the transmission Non-repudiation Impossibility of denying the authenticity of a digital signature Accounting What did you do, when did you do it and where did you do it from? EGEE tutorial
24
Public Key Infrastructure
Provides authentication, integrity, confidentiality, non-repudiation Asymmetric encryption Digital signatures A hash derived from the message and encrypted with the signer’s private key Signature checked decrypting with the signer’s public key Allows key exchange in an insecure medium using a trust model Keys trusted only if signed by a trusted third party (Certification Authority) A CA certifies that a key belongs to a given principal Certificate Public key + information about the principal + CA signature X.509 format most used PKI used by SSL, PGP, GSI, WS security, S/MIME, etc. Encrypted text Private Key Public Key Clear text message EGEE tutorial
25
X.509: content of the Certificate
An X.509 Certificate contains: owner’s public key; identity of the owner; info on the CA; time of validity; digital signature of the CA Public key Subject:C=CH, O=CERN, OU=GRID, CN=Name Surname 8968 Issuer: C=CH, O=CERN, OU=GRID, CN=CERN CA Expiration date: Dec 26 08:08: GMT CA Digital signature EGEE tutorial
26
User Responsibilities
Keep your private key secure. Do not loan your certificate to anyone. Report to your local/regional contact if your certificate has been compromised. Do not launch a delegation service for longer than your current task needs. If your certificate or delegated service is used by someone other than you, it cannot be proven that it was not you. IT IS YOUR PASSPORT AND CREDIT CARD EGEE tutorial
27
Globus Grid Security Infrastructure (GSI)
de facto standard for Grid middleware Based on PKI Implements some important features Single sign-on: no need to give one’s password every time Delegation: a service can act on behalf of a person Mutual authentication: both sides must authenticate to the other Introduces proxy certificates Short-lived certificates including their private key and signed with the user’s certificate EGEE tutorial
28
Virtual Organizations and authorization
Users MUST belong to a Virtual Organization Sets of users belonging to a collaboration Each VO user has the same access privileges to Grid resources List of supported VOs: VOs maintain a list of their members Sites decide which VOs to accept A list of supported VOs can be found here: EGEE tutorial
29
Summary EGEE hold the largest production grid in the world to-date.
Academic users can join a VO without cost. In order to use the grid a user must have A valid certificate, given by the CA Join a VO. Each action on the grid requires a valid Proxy, generated from your certificate. EGEE tutorial
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.