Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow EGI Technical Forum 21 st Sep.

Slides:



Advertisements
Similar presentations
Neil Geddes CCLRC Director, e-Science Director, Grid Operations Support Centre The UK National Grid Service.
Advertisements

Distributed RI facilities Eeva Ikonen EU-US ENV RI session in Rome ( )
Particle physics – the computing challenge CERN Large Hadron Collider –2007 –the worlds most powerful particle accelerator –10 petabytes (10 million billion.
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
E-Science Update Steve Gough, ITS 19 Feb e-Science large scale science increasingly carried out through distributed global collaborations enabled.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
SCD in Horizon 2020 Ian Collier RAL Tier 1 GridPP 33, Ambleside, August 22 nd 2014.
Grid Infrastructure in the UK Neil Geddes. Why this talk ? LHC to 2020 –GridPP to 2011 –SRIF3 to 2010 ? Who was successful in SRIF3? –Thereafter ? PPARC.
Technology on the NGS Pete Oliver NGS Operations Manager.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP27 15 th Sep 2011 GridPP.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Vendor Day 30 th April.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
Assessment of Core Services provided to USLHC by OSG.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
UK NGI Operations John Gordon 10 th January 2012.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP30 26 th Mar 2013 GridPP30.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP28 17 th Apr 2012 GridPP28.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GGF-16 Athens Production Grid Computing in the UK Neil Geddes CCLRC Director, e-Science.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
Main title ERANET - HEP Group info (if required) Your name ….
Main title HEP in Greece Group info (if required) Your name ….
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Computing for Particle.
11 March 2008 GridPP20 Collaboration meeting David Britton - University of Glasgow GridPP Status GridPP20 Collaboration Meeting, Dublin David Britton,
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
…building the next IT revolution From Web to Grid…
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
LHCbComputing Manpower requirements. Disclaimer m In the absence of a manpower planning officer, all FTE figures in the following slides are approximate.
Your university or experiment logo here The European Landscape John Gordon GridPP24 RHUL 15 th April 2010.
Grid DESY Andreas Gellrich DESY EGEE ROC DECH Meeting FZ Karlsruhe, 22./
Your university or experiment logo here What is it? What is it for? The Grid.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
The GridPP DIRAC project DIRAC for non-LHC communities.
LHC Computing, CERN, & Federated Identities
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
IAG – Israel Academic Grid, EGEE and HEP in Israel Prof. David Horn Tel Aviv University.
The National Grid Service Mike Mineter.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks UK-Ireland-France Regional Participation.
Data Preservation at Rutherford Lab David Corney 9 th July 2010 KEK.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
STFC in INDIGO DataCloud WP3 INDIGO DataCloud Kickoff Meeting Bologna April 2015 Ian Collier
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridP36 Collaboration Meeting.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Slide § David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP delivering The UK Grid.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Role and Challenges of the Resource Centre in the EGI Ecosystem Tiziana Ferrari,
Bob Jones EGEE Technical Director
Cluster Optimisation using Cgroups
UK Status and Plans Scientific Computing Forum 27th Oct 2017
Input on Sustainability
Collaboration Board Meeting
EGI High-Throughput Compute
Presentation transcript:

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow EGI Technical Forum 21 st Sep 2011 EGI Resource Forum UK Input

Slide RAL David Britton, University of Glasgow EGI TF 2 STFC Facilities: Neutron and Muon Source Synchrotron Radiation Source Lasers Space Science Particle Physics Information Technology and Data Storage Wind Energy Research Microstructures Nuclear Physics Radio Communications STFC Facilities: Neutron and Muon Source Synchrotron Radiation Source Lasers Space Science Particle Physics Information Technology and Data Storage Wind Energy Research Microstructures Nuclear Physics Radio Communications User communities Particle Physics (LHCB, CMS, ATLAS, others) ISIS (Neutron & muon source) Diamond Light Source British Atmospheric Data Centre EISCAT (Radar research) National Earth Observation Data Centre World Data Centre Central Laser Facility National Crystallography Service Hinode (Solar-B) BBSRC (BITS) … User communities Particle Physics (LHCB, CMS, ATLAS, others) ISIS (Neutron & muon source) Diamond Light Source British Atmospheric Data Centre EISCAT (Radar research) National Earth Observation Data Centre World Data Centre Central Laser Facility National Crystallography Service Hinode (Solar-B) BBSRC (BITS) … Key point: There are RAL communities who are engaged with ‘the Grid’ but not with EGI. E.g. neutrons and photon sources through PaNData, EU-DAT, etc. EGI could engage with them, possibly through us.

Slide STFC e-Science Centre David Britton, University of Glasgow EGI TF 3 Mission: “Develops and supports an integrated research computing infrastructure throughout STFC programmes, the research communities they support and the national science and engineering base.” Operates compute clusters and storage systems for diverse communities including: –STFC Facilities such as ISIS Neutron facility and Lasers –Diamond Light Source –The LHC and other particle physics projects –Space Science –Other national research communities

Slide RAL Compute Facilities David Britton, University of Glasgow EGI TF 4 Tier-1 Service for: GRIDPP WLCG EGI Grid Access EGI Middleware High throughput 6000 cores SCARF Service for STFC Facilities Local users Classic access Commercial S/W Low latency MPI 4000 cores NGS Service for: National UK e-Science Many VOs Grid Access VDT Low Latency MPI Licensed S/W 288 cores NSCCS Service for: UK Computational Chemistry Classic access Commercial S/W Community shared licences Community specific software UK -NGI

Slide National Grid Service David Britton, University of Glasgow EGI TF 5 The NGS provides the base services (such as CA and VOMs) on which different communities (e.g. GridPP/wLCG) can then build discipline- specific infrastructures. Through the NGS, the UK is actively engaging with other communities through the ESFRI projects with UK involvement. Clarin, Elixir, LifeWatch, SKA, DARIAH, ESRF upgrade and hyper. These are UK involvements, but do not involve the UK Tier-1 or Tier-2s. In parallel, NGS supports national and local activities (at both the project and institutional level), e.g. DIRAC and Digital Social Research. The NGS already has a business model but this is tailored to individual communities.

Slide The Bigger UK Picture David Britton, University of Glasgow EGI TF 6 The Tier-2s are organisational units. About 19 sites. Some sites provide a share of a central service; most sites are dedicated facilities. Some sites focus on one VO; others on several.

Slide RAL Tier-1 Usage David Britton, University of Glasgow EGI TF 7 Dominated by LHC experiments Only 4% non-LHC use from number of small VOs Non-LHC use not reaching target fair-share (10%) Low demand (usability a deterrent?)

Slide UK Tier-2 Usage David Britton, University of Glasgow EGI TF 8 Dominated by LHC experiments But non-LHC users reach 10% fair-share.

Slide Types of New Users – (1) David Britton, University of Glasgow EGI TF 9 Mature collaborations are hard to attract because they already have funding and their own infrastructure which we could not replace without substantial new resources (manpower and/or hardware). E.g. at RAL there are other (mature and/or large groups) who use different infrastructure(s). RAL Storage Services Users

Slide Types of New Users – (2) David Britton, University of Glasgow EGI TF 10 New collaborations, often without (much) funding. Typically they are keen to engage to make opportunistic use of spare resources. e.g. ILC in the UK, via GridPP, did enormous amounts of simulation work for their L.O.I. without any explicit funding. The GridPP model is that 10% of our resources are for these “other” VOs but the expectation is that, as they mature, they ask for explicit funding to bring resources to the Grid. But the important point is that much of the engagement with new groups is via the Tier-2 sites, not the Tier-1.

Slide Example: Glasgow David Britton, University of Glasgow EGI TF 11 Optics VO: We negotiated the donation of licences for the Lumerical Commercial software for optical engineering. Solid State VO: Perform medium scale MPI, CASTEP PaleoDB VO: Data mining fossil database EarthSci VO: Medium scale MPI, Developmental geology.  It is key to demonstrate to them that the Grid will work for their domain/application without them needing to risk too much.  Local experts support during the learning period is essential and this is why the Tier-2 sites at universities are a natural entry point for new VOs (the 19 universities cover a lot more fields than the Tier-1).

Slide Observations: Support David Britton, University of Glasgow EGI TF 12 Formal support structures less effective than local contact during formative stages of setting up a new VO. Tier-2 sites in the UK can provide this help to local users. GridPP has 0.5 FTE of effort that we embed in new collaborations for a period of 3-6-9? months to provide a single-contact and dedicated help. This can be an alternative to local help. Management and support workload scales: Very weakly with capacity (eg # worker nodes) Weakly with number of similar service nodes (eg # of CEs) Strongly with number of VOs as each VO brings own unique set of: requirements, workflows, software products, contact procedures, education needs and prejudices. Very strongly with middleware stack/Management structures

Slide David Britton, University of Glasgow EGI TF 13 Observations: Technical Challenges No single solution meets all community requirements e.g. “High throughput” v “Low-latency” CPU Large files v small files Data processing v Data mining Requirements for commercial licensed products Global project choices drive need for locally compatible solutions Existing products often unsatisfactory (hence drive push for alternatives) Adoption date fixes choices and forces lock in. Hard to move 10PB of data to new storage solution etc etc User Interface / Job management (also recommending DIRAC) VO management – hard to move later; new users need VO but no tools. Data management – hard for new users. Software Reliability – improving but still an issue. Job Reliability – new users expect better.

Slide Requirements David Britton, University of Glasgow EGI TF 14 Requirements of better Reliability, Availability, Serviceability, and Usability of the Grid middleware to support multiple disciplines within the constraints of the existing manpower… So faster, better and cheaper! “Within the constraints of the existing manpower” we have optimised things for the LHC. Different disciplines will have different requirements that can only add to the load.

Slide Business Models David Britton, University of Glasgow EGI TF 15 Need to be careful about imposing a common model that may not work in some countries. Business models may expose currently hidden costs with awkward consequences. Need to be sure that business models don’t just invent a lot of additional work for little gain. Business models may prevent new disciplines joining (how do they argue for funding if they don’t know if the Grid is the solution?) Sustainability: The discussion should be about how we continue to do EGI Global and International tasks when EU funding goes away. Discussions of business models pre-supposes the answer to some extent.

Slide Backup Slides David Britton, University of Glasgow EGI TF 16

Slide User Interface / Job Management Except for (some) particle physicists most new user communities are not experienced in using the Grid Many aspects of current Grid systems offer a set of facilities – and not an end to end product. e.g Pilot jobs are not part of the default install of Grid middleware. Recommending Dirac for new VO's.

Slide VO management It is much easier to have a dedicated 'small users' VO – the management overhead is smaller, and can be centralised It is extremely difficult to move from one VO to another – essentially starting again plus additional work So we find each project, even a small one, is probably better with it's own VO

Slide Data management For users with existing data, it often takes some time to work out how best to handle / distribute data Not all data is organised along lines that fit well with SRM Even with data that does, it's tricky for users to visualise why some ways of distribution are better than others Initial import of data is often hand held by experts

Slide Software Reliability The operations guys were trying to think of the last week where we didn't have to do some (non-routine) maintenance, or other fix or diagnosis They couldn't recall one Some of these are hardware related, and more self-healing would help, others were pure software Time spent fixing things is time not spent making things better

Slide Job Reliability The larger LHC VO's expect a certain level of job failures The smaller the VO, the greater the impact of any failed job i.e. the human impact is roughly O(log(failures)) Pilot job frameworks eliminate some classes of problems But there are too many failed jobs to fully investigate each one