INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org SA1 Ian Bird SA1 Activity Leader CERN IT Department EGEE Final Review 23 rd – 24 th May 2006.

Slides:



Advertisements
Similar presentations
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
INFSO-RI Enabling Grids for E-sciencE EGEE Infrastructure, Services, & Operations Ian Bird, CERN IT SA1 Activity Leader 1 st EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of Interoperability Markus Schulz.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Grid Infrastructure and Operations Maite.
08/11/908 WP2 e-NMR Grid deployment and operations Technical Review in Brussels, 8 th of December 2008 Marco Verlato.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
SEE-GRID-SCI Regional Grid Infrastructure: Resource for e-Science Regional eInfrastructure development and results IT’10, Zabljak,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse EGEE’s plans for transition.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Enabling Grids for E-sciencE EGEE-II INFSO-RI OSG-doc-498 Maite Barroso: Grid Operations LHCC review, CERN,25 th September Operations EGEE.
INFSO-RI Enabling Grids for E-sciencE Plan until the end of the project and beyond, sustainability plans Dieter Kranzlmüller Deputy.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks JRA1 summary Claudio Grandi EGEE-II JRA1.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
INFSO-RI Enabling Grids for E-sciencE Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks State of Interoperability Laurence Field.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE II: an eInfrastructure for Europe and.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird SA1 Activity Leader IT Department,
INFSO-RI Enabling Grids for E-sciencE External Projects Integration Summary – Trigger for Open Discussion Fotis Karayannis, Joanne.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Security Coordination Group Linda Cornwall CCLRC (RAL) FP6 Security workshop.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Security Coordination Group Dr Linda Cornwall CCLRC (RAL) FP6 Security workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse Technical Director CERN.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
Ian Bird LCG Project Leader On the transition to EGI – Requirements from WLCG WLCG Workshop 24 th April 2008.
EMI INFSO-RI European Middleware Initiative (EMI) Alberto Di Meglio (CERN)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
GD Plans for 2006 Ian Bird IT-GD IT PoW, November
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Technical Overview EGEE-II’s achievements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operating the EGEE Grid Presented by Mike.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse Technical Director CERN.
INFSO-RI Enabling Grids for E-sciencE SA1: Grid Operations and Management Ian Bird, CERN SA1 Activity Manager EGEE 2 nd EU Review.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
INFSO-RI Enabling Grids for E-sciencE JRA3 Åke Edlund On behalf of JRA3 EGEE 8th All-activity meeting January 18-19,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
JRA1 Middleware re-engineering
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
LCG Security Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Infrastructure Support
LCG Operations Workshop, e-IRG Workshop
EGEE: Grid Operations & Management
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE SA1 Ian Bird SA1 Activity Leader CERN IT Department EGEE Final Review 23 rd – 24 th May 2006

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Outline Recommendations from intermediate focused review Highlights of last 3 months of the project Summary of SA1 achievements and open issues sites CPU

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May SA1 Achievements Scale of the infrastructure –Has grown steadily during the project –Now slowed – expansion with related projects Sustained real production use of the infrastructure –Which is supported by the operations teams Maturing but evolving operations procedures –Dealing with all aspects of operations User support –GGUS is becoming the central coordination point, use is growing Middleware distribution –Now clear how to evolve the production service –Convergence between existing LCG-2.x and gLite-1.x Progress on interoperability and interoperation –With OSG significant progress, progress with ARC –Related projects

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 16 – i “Plan the migration procedure of service support for gLite in full production service more clearly with precise dates and mandates for each site, and advertise to the users well in advance.” & comment: “Pre-production service must not take on a life of its own…” Early set up of TCG; –forum for agreeing schedules across the technical and application activities. –Schedule proposed and agreed for 2006 – see next slide

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 16 – ii Deliver and deploy LCG end January 2006 –Bug fixes, patches, etc. accumulated since last major release in August.  Delivered on time and deployed Prepare gLite-3.0 for initial deployment in May –Convergence of LCG-2.x and gLite- 1.x –Evolutionary from deployment point of view – will not be a big-bang change of production service –Schedule driven by LCG service challenges Foresee second major “release” on October/November timescale –Added functionality – driven by apps via TCG Quickfixes, security patches –May be produced at any time, deployed with agreement of TCG Client tools –May be updated more frequently, and can be deployed rapidly without need for major upgrades Other stand-alone services may be deployed centrally or at a few sites –To demonstrate functionality or provide new facilities –Usually need by-hand installation Deployment schedule for 2006 In general we try to move away from big-bang releases: Focus on service/component upgrades where possible Check-point releases to consolidate changes and to provide new sites a starting point See this more like a Linux distribution – major releases with continual component updates, security patches, etc. Pre-production service – now integral part of the release process – should demonstrate new releases Continuous process of integration, certification, pre-production testing  eventual deployment In general we try to move away from big-bang releases: Focus on service/component upgrades where possible Check-point releases to consolidate changes and to provide new sites a starting point See this more like a Linux distribution – major releases with continual component updates, security patches, etc. Pre-production service – now integral part of the release process – should demonstrate new releases Continuous process of integration, certification, pre-production testing  eventual deployment

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 17 – i “Help to establish exemplary procedures for interoperations of more divergent infrastructures and take the lead in such activities.” Several avenues –Collaborative activities – security and operational policy –Interoperability –Interoperation / shared operation – workshops –Other projects Joint collaborative activities: –Security – JSPG, MWSG, GridPMAs –Grid Interoperability Now (GIN) group – many projects  Very active in GGF17

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 17 – ii Interoperability Several initiatives at various stages With OSG –Most advanced – cross job submission has been put in place for WLCG  Used in production by US-CMS for several months –EGEE Generic Info Provider installed on OSG site (now in VDT)  Allows all sites to be seen in info system –GStat and SFT can run on OSG sites –EGEE clients installed on OSG-LCG sites –Inversely – EGEE sites can run OSG jobs –All use SRM SEs; –File catalogues are application choice – LFC widely used

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Interoperability – cont. With ARC/NorduGrid –Strategies: 1.Agree standard interfaces at site level & evolve services for these interfaces 2.Present these interfaces at Grid boundary  portal to forward and translate 3.Deploy EGEE and ARC CE in parallel Large sites for LCG 1 is long-term goal; 2 is medium term solution Several workshops to follow progress Work on information system (GLUE) EGEE  ARC submission works With NAREGI –First workshop in March –Several joint activities agreed; work just starting  Information system translators (GLUE ↔ CIM)  Data management tools – NAREGI will test EGEE LFC, FTS, DPM  Job management JDL ↔ JSDL etc.  Security

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 17 – iii Operations (Interoperations) Joint operations: –WLCG is a strong driver – bring together EGEE and OSG grid operations –Extend ROC concept  Structures for routing tickets – prototype to be demonstrated in June  Use of GOC-DB for OSG sites  OSG sites join weekly operations meeting  Run SFTs on LCG production sites in OSG  Agreed ops VO for joint operations –Accounting – for LCG – use GGF usage record Related projects –EUMedGrid, BalticGrid, EELA, EUChinaGrid, SEE-Grid: –implement EGEE operational concepts and procedures Operations workshops –Explicitly joint with OSG, ensure related projects attend

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 17 – iv Future: –Shared operations will be a reality – required for LCG  EGEE, OSG, ARC, NAREGI –EGEE-II  Explicit tasks on interoperability  ARC and UNICORE –Expectation is for coexisting campus, local, regional, national, international grid infrastructure  Coexistence, interoperability, interoperations, common policies will be a way of life –Long term sustainable infrastructure after EGEE-II will be built on this work

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Recommendation 18 “Move away from present primary dependence on particular flavours of both processors and Linux and provide support for more heterogeneous resources, including supercomputers, to allow increased collaborative adoption at major computing centres.” Current porting status: –Several ports to other architectures: IA64, several Linux flavours. Available a few months after main release; –Done by partners; outside of main build and integration system Future: –Important to have several important ports close to or part of main integration and testing; –Include 64-bit cleanliness as part of build test – will flag as failure –Move to ETICS to provide distributed build system to support many platforms; helps tie porting partners into central process  Partner interested in a particular port can provide build and test hardware and ETICS can help integrate this into the process –TCG should agree a reasonable/realistic set of standard primary platforms to be provided as part of base release  E.g. SL4 + Debian on 32 and 64 bit  Other ports can be asynchronous and should be certified by partners providing resources –Supercomputers – should be supported by ports to relevant OS, MPI  Collaboration with DEISA in EGEE-II

INFSO-RI Enabling Grids for E-sciencE SA1 Highlights

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May EGEE: > 180 sites, 40 countries > 24,000 processors, ~ 5 PB storage EGEE Grid Sites : Q sites CPU EGEE: Steady growth over the lifetime of the project

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May A global, federated e-Infrastructure EGEE infrastructure ~ 200 sites in 39 countries ~ CPUs > 5 PB storage > concurrent jobs per day > 60 Virtual Organisations EUIndiaGrid EUMedGrid SEE-GRID EELA BalticGrid EUChinaGrid OSG NAREGI Related projects & collaborations are where the future expansion of resources will come from ProjectAnticipated resources (initial estimates) Related Infrastructure projects SEE-grid6 countries, 17 sites, 150 cpu EELA5 countries, 8 sites, 300 cpu EUMedGrid6 countries BalticGrid3 countries, fewx100 cpu EUChinaGridTBC Collaborations OSG30 sites, cpu ARC15 sites, 5000 cpu DEISASupercomputing resources

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Use of the infrastructure Sustained & regular workloads of >30K jobs/day spread across full infrastructure doubling/tripling in last 6 months – no effect on operations

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Use of the infrastructure Massive data transfers > 1.5 GB/s Several applications now depend on EGEE as their primary computing resource Sustainability: Usage can (and does) grow without need for additional operational effort

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May EGEE Operations Process Grid operator on duty –6 teams working in weekly rotation  CERN, IN2P3, INFN, UK/I, Ru,Taipei –Crucial in improving site stability and management –Expanding to all ROCs in EGEE-II Operations coordination –Weekly operations meetings –Regular ROC managers meetings –Series of EGEE Operations Workshops  Nov 04, May 05, Sep 05, June 06 Geographically distributed responsibility for operations: –There is no “central” operation –Tools are developed/hosted at different sites:  GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon) Procedures described in Operations Manual –Introducing new sites –Site downtime scheduling –Suspending a site –Escalation procedures –etc Highlights: Distributed operation Evolving and maturing procedures Procedures being in introduced into and shared with the related infrastructure projects

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Site Functional Tests Site Functional Tests (SFT) –Framework to test (sample) services at all sites –Shows results matrix –Detailed test log available for troubleshooting and debugging –History of individual tests is kept –Can include VO-specific tests (e.g. sw environment) –Normally >80% of sites pass SFTs  NB of 180 sites, some are not well managed Very important in stabilising sites: Apps use only good sites Bad sites are automatically excluded Sites work hard to fix problems Extending to service availability: measure availability by service, site, VO each service has associated service class defining required availability (Critical, highly available, etc.) First approach to SLA Use to generate alarms generate trouble tickets call out support staff

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Middleware Distributions and Stacks Terminology: –EGEE deploys a middleware distribution  Drawn from various middleware products, stacks, etc.  Do not confuse the distribution with development projects or with software packages  Count on 6 months from software developer “release” to production deployment –The EGEE distribution:  Current production version labelled: LCG  New production version labelled: gLite-3.0  Name change to hopefully reduce confusion EGEE distribution contents:  LCG-2.7.0: –VDT – packaging Globus 2.4, Condor, MyProxy –EDG workload management –LCG components:  BDII (info sys),  catalogue (LFC),  DPM, data management libraries and CLI tools  monitoring tools –gLite: R-GMA, VOMS, FTS  gLite-3.0: –Based on LCG-2.7.0, and –gLite workload management –Other gLite components (not in the distribution but provided as services):  AMGA, Hydra, Fireman  gLite-IO evolution

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Integration VDT/OSG OMII- Europe JRA1 SA3 … Testing & Certification Support, analysis, debugging Production service SA1 Pre-production service Middleware providers SA3 Certification activities SA3+SA1 Process to deployment

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Central Application (GGUS) Deployment Support Middleware Support Network Support Operations Support TPM ROC 1 ROC 10 ROC… VO Support Interface Webportal The Support Model “Regional Support with Central Coordination" The ROCs, VOs and other project- wide groups such as the middleware groups (JRA), network groups (NA), service groups (SA) areJRANA connected via a central integration platform provided by GGUS. Regional Support units User Support units Technical Support units GGUS is now being used for all problem reporting: Operational, deployment and user support VOs are using it for their support system The use is growing steadily

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Security & Policy Collaborative policy development –Many policy aspects are collaborative works; e.g.: Joint Security Policy Group Certification Authorities –EUGridPMA  IGTF, etc. Grid Acceptable Use Policy (AUP) –common, general and simple AUP –for all VO members using many Grid infrastructures  EGEE, OSG, SEE-GRID, DEISA, national Grids… Incident Handling and Response –defines basic communications paths –defines requirements (MUSTs) for IR –not to replace or interfere with local response plans Security & Availability Policy Usage Rules Certification Authorities Audit Requirements Incident Response User Registration & VO Management Application Development & Network Admin Guide VO Security

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May SA1 goals for EGEE-II Key goal: –We have a large running production infrastructure; But EGEE-II MUST take what we have now and make it:  Reliable  Middleware components fail, error reporting is missing, …  There is an application responsibility here too – needs effort  … but ! The service has been running non-stop for > 2 years  Robust  Must continue to address service aspects – move away from prototypes  Usable  It is still hard to use for many users; still too slow to introduce new VOs  Acceptable  It must be easy to deploy in a wide variety of environments and coexist with other grid infrastructures  Sustainable  The infrastructure must become sustainable for the long term

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May SA1 Outlook LHC VOs must achieve reliable production and analysis in 2006 –Will be making significant use of resources –Applications must bring resources  show commitment Consolidate and improve existing services: Focus on –Reliability, robustness, manageability, performance, scalability, etc. –Evolution or replacement of services driven by needs of application (or operations/security/manageability)  TCG has key role here Expand grid operations –Spread expertise to ROCs –Collaboration with OSG, A-P, etc. and related projects –Start to negotiate SLAs –Sustainability: processes evolving, spread of expertise and tasks –Resource sharing and negotiation – must become streamlined  Will need a mechanism for cost/credit for use of resources

Enabling Grids for E-sciencE INFSO-RI Ian Bird, SA1, EGEE Final Review th May Summary SA1 has built a large production grid infrastructure In constant and extensive daily production use –Several applications depend on it for resources Tools and processes are maturing and evolving Security and usage policies also evolving We have a basic set of middleware that addresses most requirements Production middleware is converged now LCG-2 + gLite  gLite 3 EGEE-II will focus on making this sustainable and really usable