EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE A Large-scale Production Grid Infrastructure Erwin Laure EGEE Technical Director ISSGC06 July 16-28, 2006 Ischia, Italy
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 2 Lost in Definitions? Defining the “Grid”: Access to (high performance) computing power Distributed parallel computing Improved resource utilization through resource sharing Increased storage provision Controlled access to distributed storage Interconnection of arbitrary resources (sensors, instruments, …) Collaboration between users/resources Higher abstraction layer above network services Corresponding security …
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 3 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. This interconnection of users, resources, and services for jointly addressing dedicated tasks is called a virtual organization. Comparison between Grids and Networks: –Networks realize message exchange between endpoints –Grids realize services for the users higher level of abstraction
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 4 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 5 The EGEE Project Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)” EGEE –1 April 2004 – 31 March 2006 –71 partners in 27 countries, federated in regional Grids EGEE-II –1 April 2006 – 31 March 2008 –Expanded consortium 91 partners
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 6 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 7 EGEE Infrastructure Country participating in EGEE Scale (June 2006): ~ 200 sites in 40 countries ~ CPUs > 10 PB storage > jobs per day > 60 Virtual Organizations
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 8 EGEE Infrastructures Production service –Scaling up the infrastructure with resource centres around the globe –Stable, well-supported infrastructure, running only well-tested and reliable middleware Pre-production service –Run in parallel with the production service (restricted nr of sites) –First deployment of new versions of the gLite middleware –Test-bed for applications and other external functionality T-Infrastructure (Training&Education) –Complete suite of Grid elements and application (Testbed, CA, VO, monitoring, support, …) –Everyone can register and use GILDA for training and testing 20 sites on 3 continents
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 9 EGEE Operations Process Geographically distributed responsibility for operations: –There is no “central” operation –Regional Operation Centers Responsible or resource centers in their region –Tools are developed/hosted at different sites: GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon) Grid operator on duty –6 teams working in weekly rotation CERN, IN2P3, INFN, UK/I, Ru,Taipei –Crucial in improving site stability and management –Expanding to all ROCs in EGEE-II Operations coordination –Weekly operations meetings –Regular ROC managers meetings –Series of EGEE Operations Workshops Nov 04, May 05, Sep 05, June 06 Procedures described in Operations Manual –Introducing new sites –Site downtime scheduling –Suspending a site –Escalation procedures; etc. Highlights: Distributed operation Evolving and maturing procedures Procedures being in introduced into and shared with the related infrastructure projects
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 10 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 11 Production Grid Middleware Key factors in EGEE Grid Middleware Development: 1.Strict software process Use industry standard software engineering methods –Software configuration management, version control, defect tracking, automatic build system, … 2.Conservative approach in what software to use Avoid “cutting-edge” software –Deployment on over 100 sites cannot assume a homogenous environment – middleware needs to work with many underlying software flavors Avoid evolving standards –Evolving standards change quickly (and sometime significantly cf. OGSI vs. WSRF) – impossible to keep pace on > 100 sites Long (and tedious) path from prototypes to production
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 12 EGEE Middleware: gLite Exploit experience & existing components –VDT (Condor, Globus) –EDG/LCG –AliEn –… Develop a lightweight stack of EGEE generic middleware –Dynamic deployment –Pluggable components Focus is on re-engineering and hardening March 4, 2006: gLite 3.0 LCG-2 prototyping product product gLite 2006 gLite 3.0
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 13 Developing gLite 3.0 now available on production infrastructure After gLite 3.0: –Continuous release of single components As needed by users and as made available by developers –Major releases provide a “check-point” In general in coincidence with major application challenges Continuing development to –Bring components not yet included in release to maturity –Improve functionality –Increase robustness –Increase usability –Improve the compliance to international standards
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 14 Grid Interoperability Leading role in building world-wide grids Incubator for new Grid projects world-wide Interoperation efforts –Bilateral: EGEE/OSG, EGEE/NDGF, EGEE/NAREGI –Multilateral: Grid Interoperability Now (GIN) Experiences and requirements fed back into standardization process (GGF – now OGF) Strengthening contacts with industry GINGIN
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 15 Middleware Globus GT4Condor APST Platform Infrastructure UnixWindowsJVMTCP/IPMPI.Net Runtime Environmental Sciences Life & Pharmaceutical Sciences Applications Geo Sciences Building Software for the Grid VPNSSH Courtesy IBM Slide Courtesy David Abramson
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 16 Middleware Globus GT4Condor APST Platform Infrastructure UnixWindowsJVMTCP/IPMPI.Net Runtime Environmental Sciences Life & Pharmaceutical Sciences Applications Geo Sciences Building Software for the Grid VPNSSH Courtesy IBM, Upper Middleware & Tools Lower Middleware Bonds Slide Courtesy David Abramson
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 17 Middleware structure Higher-Level Grid Services may or may not be used by the applications –should help them but not be mandatory Foundation Grid Middleware is deployed on the infrastructure –should not assume the use of Higher-Level Grid Services –must be complete and robust –should allow interoperation with other major grid infrastructures
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 18 gLite Grid Middleware Services Overview paper
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 19 Job submission Computing Element Storage Element Site X Information System submit query retrieve Resource Broker User Interface publish state File and Replica Catalogs Authorization Service query update credential publish state discover services
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 20 SA3 Testing & Certification Functional Tests Testbed Deployment gLite Software Process JRA1 Development Software Error Fixing SA3 Integration Deployment Packages Integration Tests Installation Guide, Release Notes, etc SA1 Pre- Production Scalability Tests Pre-Production Deployment Fail Pass SA1 Production Infrastructure Release Problem Serious problem Directives
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 21 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 22 EGEE Applications >20 applications –Astronomy –Biomedicine –Computational Chemistry –Earth Sciences –Financial Simulation –Fusion –Geo-Physics –High Energy Physics Further applications in evaluation Applications now moving from testing to routine and daily usage
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 23 High Energy Physics Large Hadron Collider (LHC): One of the most powerful instruments ever built to investigate matter 4 Experiments: ALICE, ATLAS, CMS, LHCb 27 km circumference tunnel Due to start up in 2007 Mont Blanc (4810 m) Downtown Geneva
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 24 Accelerating and colliding particles
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 25 The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors The LHC Accelerator
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 26 LHC DATA This is reduced by online computers that filter out a few hundred “good” events per sec. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 27 simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 28 LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 29 Example: HEP LHC data and service challenges –Preparing for LHC start-up in 2007 –Ensure key services & infrastructure are in place –Emphasis on providing a service Computing needs of experiments –E.g. LHCb: ~700 CPU years in 2005 on the EGEE infrastructure –E.g. ATLAS: over 10,000 jobs per day ATLAS LHCb ATLAS Massive data transfers > 1.5 GB/s
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 30 Example: Addressing emerging diseases Emerging diseases know no frontiers. Time is a critical factor Avian influenza: human casualties International collaboration is required for: Early detection Epidemiological watch Prevention Search for new drugs Search for vaccines
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 31 WISDOM, the first step WISDOM focuses on drug discovery for neglected and emerging diseases. –Summer 2005: World-wide In Silico Docking On Malaria 46 million ligands docked in 6 weeks ~1 million virtual ligands selected 1TB of data produced 1000 computers in 15 countries Equivalent to 80 CPU years –Spring 2006: drug design against H5N1 neuraminidase involved in virus propagation impact of selected point mutations on the efficiency of existing drugs identification of new potential drugs acting on mutated N1 N1H5
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 32 Challenges for high throughput virtual docking 300,000 Chemical compounds: ZINC & Chemical combinatorial library Target (PDB) : Neuraminidase (8 structures) Millions of chemical compounds available in laboratories High Throughput Screening 2$/compound, nearly impossible Molecular docking (Autodock) ~100 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers In vitro screening of 100 hits Hits sorting and refining
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 33 Example: Pharmacokinetis A lesion is detected in an MRI study of a patient – start with virtual biopsy –The process requires obtaining a sequence of MRI volumetric images. –Different images are obtained in different breath-holds. –Before analyzing the variation of each voxel, images must be co-registered to minimize deformation due to different breath holds. The total computational cost of a clinical trial of 20 patients is around 100 CPU days.
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 34 Example: Determining earthquake mechanisms Seismic software application determines epicentre, magnitude, mechanism Analysis of Indonesian earthquake (28 March 2005) –Seismic data within 12 hours after the earthquake –Solution found within 30 hours after earthquake occurred 10 times faster on the Grid than on local computers –Results Not an aftershock of December 2004 earthquake Different location (different part of fault line further south) Different mechanism Rapid analysis of earthquakes important for relief efforts Peru, June 23, 2001 Mw=8.4 Sumatra, March 28, 2005 Mw=8.5
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 35 Flood forecasting problem Many kinds of data –Meteorological, hydrological, hydraulic –Generated by simulations or obtained from sensors –Permanent or periodically updated –Publicly available or with restricted access
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 36 ITU-BR system for RRC 2006 ITU-BR developed a system for RRC 2006 –Run compatibility and complementary analysis –84 PCs executing 168 parallel tasks –Compatibility analysis < 4h Great Success ! ITU-BR wanted to be sure and do even better –Provide more CPU power –Reduce risks by providing a supplementary system –Gain experience on how to access large and reliable computing resources ‘on demand’ EGEE used a subset of its Grid for RRC 2006 –Over 400 PCs –Compatibility analysis < 1h
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 37 The Future of Grids Increasing the number of infrastructure users by increasing awareness –Dissemination and outreach –Training and education Increasing the number of applications by improving application support and middleware functionality –Improved usability through high level grid middleware extensions Increasing the grid infrastructure –Incubating related projects –Ensuring interoperability between projects Protecting user investments –Towards a sustainable grid infrastructure
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 38 User Information & Support More than 170 training events and summer schools across many countries –>3000 people trained induction; application developer; advanced; retreats –Material archive online with ~250 presentations Public and technical websites Dissemination material constantly evolving to expand information and keep it up to date 4 conferences organized (~ Pisa) Next conference: September 2006 in Geneva ~600 participants
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 39 Industry and EGEE-II Industry Task Force –Group of industry partners in the project –Links related industry projects (NESSI, BEinGRID, …) –Works with EGEE’s Technical Coordination Group Collaboration with CERN openlab project –IT industry partnerships for hardware and software development EGEE Business Associates (EBA) –Companies sponsoring work on joint-interest subjects Industry Forum –Led by Industry to improve Grid take-up in Industry –Organises industry events and disseminates grid information e.g. this Wednesday here at the school
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 40 The Future of Grids Increasing the number of infrastructure users by increasing awareness –Dissemination and outreach –Training and education Increasing the number of applications by improving application support and middleware functionality –Improved usability through high level grid middleware extensions Increasing the grid infrastructure –Incubating related projects –Ensuring interoperability between projects Protecting user investments –Towards a sustainable grid infrastructure
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 41 Middleware Globus GT4Condor APST Platform Infrastructure UnixWindowsJVMTCP/IPMPI.Net Runtime Environmental Sciences Life & Pharmaceutical Sciences Applications Geo Sciences Building Software for the Grid VPNSSH Courtesy IBM, Lower Middleware Upper Middleware & Tools Bonds Slide Courtesy David Abramson ???
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 42 Portals on EGEE P-Grade Genius
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 43 Example: Biomedicine Parallel simulation of blood flow on the Grid Online visualization of simulation results on the desktop Interactive steering of simulation Grid is „invisible“ Cooperation with University Amsterdam
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 44 Example: Flooding Crisis Support Simulation of flooding on the Grid Online visualization of simulation results in the CAVE Interactive steering of simulation Grid is „invisible“ Cooperation with Slowak Academy of Sciences
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 45 Scientific Visualization Use your favourite device to connect to the Grid: Sony PSP – PlayStation Portable
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 46 Not only portals Portals are a good way to bring computing power to end-users –In most cases domain specific Application programmers (and portal programmers) need more powerful interfaces –Workflow engines –Higher level programming abstractions (SAGA, DRMAA, …) –Programming environments (gEclipse) –Compilers? –…
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 47 The Future of Grids Increasing the number of infrastructure users by increasing awareness –Dissemination and outreach –Training and education Increasing the number of applications by improving application support and middleware functionality –Improved usability through high level grid middleware extensions Increasing the grid infrastructure –Incubating related projects –Ensuring interoperability between projects Protecting user investments –Towards a sustainable grid infrastructure
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 48 Projects related to EGEE EUGRID
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 49 Related Infrastructures GINGIN
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 50 The Future of Grids Increasing the number of infrastructure users by increasing awareness –Dissemination and outreach –Training and education Increasing the number of applications by improving application support and middleware functionality –Improved usability through high level grid middleware extensions Increasing the grid infrastructure –Incubating related projects –Ensuring interoperability between projects Protecting user investments –Towards a sustainable grid infrastructure
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 51 Sustainability: Beyond EGEE-II Need to prepare for permanent Grid infrastructure –Maintain Europe’s leading position in global science Grids –Ensure a reliable and adaptive support for all sciences –Independent of project funding cycles –Modelled on success of GÉANT Infrastructure managed centrally in collaboration with national bodies (in EGEE-II: JRUs)
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 52 Grids in Europe Great investment in developing Grid technology Sample of National Grid projects: –Austrian Grid Initiative –DutchGrid –France: Grid’5000 –Germany: D-Grid; Unicore –Greece: HellasGrid –Grid Ireland –Italy: INFNGrid; GRID.IT –NorduGrid –Swiss Grid –UK e-Science: National Grid Service; OMII; GridPP EGEE provides framework for national, regional and thematic Grids
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 53 Evolution EGEE EGEE-II EDG EGEE-III European e-Infrastructure Coordination Testbeds Utility Service Routine Usage
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 54 Summary Grids represent a powerful new tool for science Today we have a window of opportunity to move grids from research prototypes to permanent production systems (as networks did a few years ago) EGEE offers … … a mechanism for linking together people, resources and data of many scientific community … a basic set of middleware for gridfying applications with documentation, training and support … regular forums for linking with grid experts, other communities and industry
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 55 Summary Success will lead to the adoption of grids as the main computing infrastructure for science If we succeed then the potential return to international scientific communities will be enormous and open the path for commercial and industrial applications
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE - A Large-scale Production Grid Infrastructure 56 EGEE’06 Conference EGEE’06 – Capitalising on e-infrastructures –Demos –Related Projects –Industry –International community (UN organisations in Geneva etc.) September 2006 Geneva, Switzerland