INFSO-RI Enabling Grids for E-sciencE Introduction to Grid Comptuing and EGEE Fabio Scibilia INFN Catania Catania,
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Fundamentals of Grid Computing
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Grid Idea By A Simple Analogy The user: –Does not need to know anything about what stays beyond the socket. –Can absorb all the power he wants according to the agreement The power society –Can modify production technologies at any moment –Manages the power network as it wants –Defines terms and conditions of the agreement Some power stations dispersed everywhere produce the electrical power The produced power is distributed over a power network One consumer wants to access to that power Now the user is able to access to the power grid He/she comes to an agreement with the electrical society The electrical society provides for a new socket in which the user can plug
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, In the same way... The user: –Does not need to know what stays beyond its user interface –Can access to a massive amount of computational power through a simple terminal The society: –Can extend grid facilities at any moment –Manages the architecture of the grid –Defines policies and rules for accessing to grid resources Some computing farms produce the computing power Computing power is made available over the Internet One user wants to access to intensive computational power He/she comes to an agreement with some society that offers grid services Now the user accesses to grid facilities as a grid user The society will provide for grid facilities allowing the user to access to its grid resources and providing for proper tools
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, What about Grid Computing Share data Distribute computation Coordinate works Access to remote instrumentation Grid Computing paradigm is an emerging way of thinking distributed environments in a global scale infrastructure to:
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Why Computing Grids now? Because the amount of computational power needed by many applications is getting very huge Because the amount of data requires massive and complex distributed storage systems To make easier the cooperation of people and resources belonging to different organizations To access to particular instrumentation that is not easily reachable in a different way Because it is the next of step in the evolution of distribution of computation Thousands of CPUs working at the same time on the same task From hundreds of Gigabytes to Petabytes (10 15 ) produced by the same application. People of several organizations working together to achieve a common goal Because it cannot be moved or replicated or its cost is too much expensive. To create a marketplace of computational power and storage over the Internet
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Who is interested in Grids? Research community, to carry out important results from experiments that involve many and many people and massive amounts of resources Enterprises that can have huge computation without the need for extending their current informatics infrastructure Businesses, which can provide for computational power and data storage against a contract or for rental
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Properties of Grids Transparency –The complexity of the Grid architecture is hidden to the final user –The user must be able to use a Grid as it was a unique virtual supercomputer –Resources must be accessible setting their location apart Openness –Each subcomponent of the Grid is accessible independently of the other components Heterogeneity –Grids are composed by several and different resources Scalability –Resources can be added and removed from the Grid dynamically Fault Tolerance –Grids must be able to work even if a component fails or a system crashes Concurrency –Different processes on different nodes must be able to work at the same time
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Challenged Issues in Grids (i) Security –Authentication and authorization of users –Confidentiality and not repudiation Information Services –To discover and monitor Grid resource –To check for health-status of resources –As basis for decision making processes File Management –Creation, modification and deletion of files –Replication of files to improve access performances –Ability to access to files without the need to move tham locally to the code Administration –Systems to administer Grid resource respecting local administration policies
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Challenged Issues in Grids (ii) Resource Brokering –To schedule tasks across different resources –To make optimal or suboptimal decisions –To reserve (in the future) resources and network bandwidth Naming services –To name resources in un unambiguous way in the Grid scope Friendly User Interfaces –Because most of Grid users have nothing to do with computing science (physicians, chemistries...) –Graphical User Interfaces (GUIs) –Grid Portals (very similar to classical Web Portals) –Command Line Interfaces (CLIs) for experts
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Virtual Organizations (VOs) A Virtual Organization is a collection of people and resources that work in a coordinated way to achieve a common goal To use Grid facilities, any user MUST subscribe to a Virtual Organization as member Each people or resource can be member of more VOs at the same time Each VO can contain people or resources belonging to different administration domains
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Virtual Laboratory A new way of cooperating in experiments A platform that allow scientists to work together on in the same “Virtual” Laboratory Strictly correlated to Grids and Virtual Organizations
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Globus Alliance The Globus Alliance –Is a community of people and organizations involved in projection and development of Grid technologies –University of Illinois, Argonne National Laboratory, University of Edinburgh, EPCC, etc… The Globus Toolkit (GT) –It is a standard de facto –It is a bag of services –At its fourth release (GT4) –Now adopts Web Services interfaces The Global Grid Forum –It is a forum of grid researchers –Works to define standards and protocols on grid technologies – It is divided in Working Groups (WGs) –
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Globus Services
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Hourglass Reference Model Fabric layer: –Manages resources locally Connectivity –Network communications (IP, DNS etc.) –Security: authentication, authorization, certification –Single Sign On Resource –Allocation, reservation and monitoring of resources –Data access and transport –Gathering of information on resources Collective –View of services as collections –Discovery and allocation –Replica and catalogue of data –Management of workflow Application –User applications –Tools and interfaces Fabric Connectivity Resource Collective Application
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, An Example: The project –Searches for Extra Terrestrial Intelligence (SETI) Collecting samples of microwaves coming from the Universe through a telescope Scheduling tasks spread over Grid nodes to analyse these samples –Uses desktop computers as Grid nodes –Working nodes are dynamically added and removed to the grid –The owner of the desktop machine decides how contribute to the project offering its computational power To contribute to the project – –Download and install the client –Your machine will work as a Grid node when is idle (in place of your screensaver)
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Application Areas (i) Physicical Science Applications –GryPhiN, –Particle Physics DataGrid (PPDG), –GridPP, –AstroGrid, Life Science Applications –Protein Data Bank (PDB), –Biomedical Informatics Research Network (BIRN), –Telemicroscopy, –myGrid,
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Application Areas (ii) Engineering Oriented Applications –NASA Information Power Grid (IPG), –Grid Enabled Optimization and Design Search for Engineering (GEODISE), Commercial Applications –Butterfly Grid, –Everquest, E-Utility –ClimatePrediction experiment,
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, EGEE Project
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, EGEE Partners CERN Central Europe including Austria, Czech Republic, Hungary, Poland, Slovakia and Slovenia France Germany and Switzerland Ireland and the United Kingdom Italy Northern Europe including Belgium, Denmark, Finland, The Netherlands, Norway and Sweden Russia South-East Europe including Bulgaria, Cyprus, Greece, Israel and Romania South-West Europe including Portugal and Spain NRENS (National Research and Education Networks) United States
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, The largest e-Infrastructure: EGEE Objectives –consistent, robust and secure service grid infrastructure –improving and maintaining the middleware –attracting new resources and users from industry as well as science Structure –71 leading institutions in 27 countries, federated in regional Grids –leveraging national and regional grid activities worldwide –funded by the EU with ~32 M Euros for first 2 years starting 1st April 2004
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, EGEE Activities 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, EGEE Enabling Grids for E-SciencE (EGEE) in Europe –Funded by the European Union (EU) –Involves 26 countries and more than 70 institutions EGEE infrastructure –Over GEANT European Communication Network –LHC Computing Grid (LCG) Middleware –Moving towards the complete adoption of the new gLite middleware Globus 2 basedWeb services based gLite-2gLite-1LCG-2LCG-1
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Large Hadron Collider It is a particle accelerator built in Geneve The biggest instrument ever built Data is collected in a few places of the LHC and distributed across many computing sites Mont Blanc (4810 m) Downtown Geneva
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, The LHC Experiments Large Hadron Collider (LHC): –four experiments: ALICE ATLAS CMS LHCb –27 km tunnel –Start-up in 2007
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, ATLASCMS LHCb ~10-15 PetaBytes /year ~10 8 events/year ~10 3 batch and interactive users The LHC Experiments
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Grid monitoring –GIIS Monitor + Monitor Graphs –Sites Functional Tests –GOC Data Base –Scheduled Downtimes –Live Job Monitor –GridIce – VO + Fabric View –Certificate Lifetime Monitor Operation of Production Service: real-time display of grid operations Accounting Information Selection of Monitoring tools:
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, BioMed Overview Infrastructure –~3.000 CPUs –~12 TB of disk –in 9 countries >50 users in 7 countries working with 12 applications 18 research labs Month Number of jobs PADOVA BARI 15 resource centres 17 CEs 16 SEs
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Biomed Virtual Organisation ~ 70 users, 9 countries > 12 Applications (medical image processing, bioinformatics) ~3000 CPUs, ~12 TB disk space ~100 CPU years, ~ 500K jobs last 6 months
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Bioinformatics Grid Protein Sequence Analysis –Gridified version of NPSA web portal Offering proteins databases and sequence analysis algorithms to the bioinformaticians (3000 hits per day) Need for large databases and big number of short jobs –Objective: increased computing power –Status: 9 bioinformatic softwares gridified –Grid added value: open to a wider community with larger bioinformatic computations xmipp_MLrefine –3D structure analysis of macromolecules From (very noisy) electron microscopy images Maximum likelihood approach to find the optimal model –Objective: study molecule interaction and chem. properties –Status: algorithm being optimised and ported to 3D –Grid added value: parallel computation on different resources of independent jobs
Enabling Grids for E-sciencE INFSO-RI EGEE Tutorial, Roma, Contacts EGEE Website How to join How to test EGEE Project Office