Massive Computing at CERN and lessons learnt

Slides:



Advertisements
Similar presentations
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Why Grids Matter to Europe Bob Jones EGEE.
Advertisements

Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
Welcome to CERN Accelerating Science and Innovation 2 nd March 2015 – Bidders Conference – DO-29161/EN.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Welcome to CERN Research Technology Training Collaborating.
Project Status Report Ian Bird Computing Resource Review Board 30 th October 2012 CERN-RRB
LHC: An example of a Global Scientific Community Sergio Bertolucci CERN 5 th EGEE User Forum Uppsala, 14 th April 2010.
CERN IT Department CH-1211 Genève 23 Switzerland CERN openlab Board of Sponsors CERN IT: Some Strategic Directions David Foster Deputy IT.
Resources and Financial Plan Sue Foffano WLCG Resource Manager C-RRB Meeting, 12 th October 2010.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Emmanuel Tsesmelis 2 nd CERN School Thailand 2012 Suranaree University of Technology.
What is EGI? The European Grid Infrastructure enables access to computing resources for European scientists from all fields of science, from Physics to.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
A short introduction to GRID Gabriel Amorós IFIC.
Frédéric Hemmer, CERN, IT Department The LHC Computing Grid – June 2006 The LHC Computing Grid Visit of the Comité d’avis pour les questions Scientifiques.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
Petabyte-scale computing for LHC Ian Bird, CERN WLCG Project Leader WLCG Project Leader ISEF Students 18 th June 2012 Accelerating Science and Innovation.
Rackspace Analyst Event Tim Bell
Notur: - Grant f.o.m is 16.5 Mkr (was 21.7 Mkr) - No guarantees that funding will increase in Same level of operations maintained.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE – paving the way for a sustainable infrastructure.
Ian Bird LCG Project Leader OB Summary GDB 10 th June 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE project’s status and future Bob.
Progress in Computing Ian Bird ICHEP th July 2010, Paris
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WelcomeWelcome CSEM – CERN Day 23 rd May 2013 CSEM – CERN Day 23 rd May 2013 to Accelerating Science and Innovation to Accelerating Science and Innovation.
CERN as a World Laboratory: From a European Organization to a global facility CERN openlab Board of Sponsors July 2, 2010 Rüdiger Voss CERN Physics Department.
EGI-InSPIRE Steven Newhouse Interim EGI.eu Director EGI-InSPIRE Project Director Technical Director EGEE-III 1GDB - December 2009.
Ian Bird LCG Project Leader WLCG Update 6 th May, 2008 HEPiX – Spring 2008 CERN.
Dr. Andreas Wagner Deputy Group Leader - Operating Systems and Infrastructure Services CERN IT Department The IT Department & The LHC Computing Grid –
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes: N America: 1,180 Europe: 386 Asia: 235 Africa: 6 Oceania: 81 S America:
LHC Computing, CERN, & Federated Identities
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Frédéric Hemmer IT Department 26 th January 2010 Visit of Michael Dell 1 Frédéric.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
WLCG Worldwide LHC Computing Grid Markus Schulz CERN-IT-GT August 2010 Openlab Summer Students.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
The Mission of CERN  Push back  Push back the frontiers of knowledge E.g. the secrets of the Big Bang …what was the matter like within the first moments.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE: Enabling grids for E-Science Bob Jones.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
WLCG: The 1 st year with data & looking to the future WLCG: Ian Bird, CERN WLCG Project Leader WLCG Project LeaderLCG-France; Strasbourg; 30 th May 2011.
Global Aluminium Pipe and Tube Market to 2018 (Market Size, Growth, and Forecasts in Nearly 60 Countries) Published Date: Jul-2014 Reports and Intelligence.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Pierre Auger Observatory Jiří Chudoba Institute of Physics and CESNET, Prague.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Reflections on the first year of EGI & EGI-InSPIRE Steven Newhouse Project.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Best Sustainable Development Practices for Food Security UV-B radiation: A Specific Regulator of Plant Growth and Food Quality in a Changing Climate The.
Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities.
1 Impact assessment of e-infrastructures lessons learnt from EGEE Bob Jones CERN IT Department Bob JonesNovember 2010.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
The 5 minutes tour of CERN The 5 minutes race of CERN
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Long-term Grid Sustainability
Dagmar Adamova, NPI AS CR Prague/Rez
EGEE support for HEP and other applications
ROSCOE Robust Scientific Communities for EGI
Connecting the European Grid Infrastructure to Research Communities
Input on Sustainability
Presentation transcript:

Massive Computing at CERN and lessons learnt Bob Jones CERN Bob.Jones <at> CERN.ch

WLCG – what and why? A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Managed and operated by a worldwide collaboration between the experiments and the participating computer centres The resources are distributed – for funding and sociological reasons Our task is to make use of the resources available to us – no matter where they are located Ian Bird, CERN

What is WLCG today? Collaboration Coordination & management & reporting Coordinate resources & funding Coordination with service & technology providers Common requirements Memorandum of Understanding Framework Service management Service coordination Operational security Support processes & tools Common tools Monitoring & Accounting World-wide trust federation for CA’s and VO’s Complete Policy framework Distributed Computing services Physical resources: CPU, Disk, Tape, Networks Ian.Bird@cern.ch

WLCG data processing model Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis

CERN NDGF US-FNAL UK-RAL De-FZK Barcelona/PIC Lyon/CCIN2P3 US-BNL Amsterdam/NIKHEF-SARA Taipei/ASGC Bologna/CNAF NDGF WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations Ca- TRIUMF Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. US-FNAL UK-RAL De-FZK Lyon/CCIN2P3 Barcelona/PIC 26 June 2009 Ian Bird, CERN

Redundancy meant no interruption Fibre cut during 2009: Redundancy meant no interruption Ian Bird, CERN

Worldwide resources >140 sites ~250k CPU cores ~100 PB disk

Service quality: defined in MoU MoU defines key performance and support metrics for Tier 1 and Tier 2 sites Reliabilities are an approximation for some of these Also metrics on response times, resources, etc. The MoU has been an important tool in bringing services to an acceptable level Ian Bird, CERN

From testing to data: e.g. DC04 (ALICE, CMS, LHCb)/DC2 (ATLAS) in 2004 saw first full chain of computing models on grids Independent Experiment Data Challenges 2004 SC1 Basic transfer rates Service Challenges proposed in 2004 To demonstrate service aspects: Data transfers for weeks on end Data management Scaling of job workloads Security incidents (“fire drills”) Interoperability Support processes 2005 SC2 Basic transfer rates SC3 Sustained rates, data management, service reliability 2006 SC4 Nominal LHC rates, disk tape tests, all Tier 1s, some Tier 2s 2007 Focus on real and continuous production use of the service over several years (simulations since 2003, cosmic ray data, etc.) Data and Service challenges to exercise all aspects of the service – not just for data transfers, but workloads, support structures etc. 2008 Service testing and data challenges started many years before the accelerator started to produce data. This was important to ensure the functionality and quality of service in order to provide a continuous service. The users (LHC experiments) were involved at all stages of this implementation, deployment and testing. CCRC’08 Readiness challenge, all experiments, ~full computing models 2009 STEP’09 Scale challenge, all experiments, full computing models, tape recall + analysis 2010 Ian Bird, CERN

Large scale = long times LHC, the experiments, & computing have taken ~20 years to build and commission They will run for at least 20 years We must be able to rely on long term infrastructures Global networking Strong and stable NGIs (or their evolution) That should be eventually self-sustaining Long term sustainability - must come out of the current short term project funding cycles LHC uses a continuously running production service that will be required for several decades. It must evolve with technology and be supported by several generations of developers and operational staff. What is the origin of life aiming for? A short-term objectives or a long-term service? This will have an impact on the decision to be made. Ian Bird, CERN

Grids & HEP: Common history CERN and the HEP community have been involved with grids from the beginning Recognised as a key technology for implementing the LHC computing model HEP work with EC-funded EDG/EGEE in Europe, iVDGL/Grid3/OSG etc. in US has been of clear mutual benefit Infrastructure development driven by HEP needs Robustness needed by WLCG is benefitting other communities Transfer of technology from HEP Ganga, AMGA, etc used by many communities now Ian Bird, CERN

European Grid Infrastructure European Data Grid (EDG) Explore concepts in a testbed Enabling Grid for E-sciencE (EGEE) Moving from prototype to production European Grid Infrastructure (EGI) Routine usage of a sustainable e-infrastructure

European Grid Infrastructure (Status April 2011 – yearly increase) Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … 13319 end-users (+9%) 186 VOs (+6%) ~30 active VOs: constant Logical CPUs (cores) 207,200 EGI (+8%) 308,500 All 90 MPI sites 101 PB disk 80 PB tape 25.7 million jobs/month 933,000 jobs/day (+91%) 320 sites (1.4%) 58 countries (+11.5%) Figures are from D4.2 , increase computed using figures from Slide 1 (April 2010), unless differently specified 13319 users, 9.5% increase: as reference value in April 2010 I have taken the number of users documented in SA1.2.2 (EGEE deliverable), where the number of users reported was much more than 10000 units as reported in slide 1! Million job/month: the increase percentage is computed using the *average* million job/month from May 2009 to April 2010 Non-HEP users ~ 3.3M jobs / month EGI - The First Year

Grids, clouds, supercomputers, etc. Collaborative environment Distributed resources (political/sociological) Commodity hardware (HEP) data management Complex interfaces (bug not feature) Communities expected to contribute resources Supercomputers Scarce Low latency interconnects Applications peer reviewed Parallel/coupled applications Also SC grids (DEISA/PRACE, Teragrid/XD) Clouds Proprietary (implementation) Economies of scale in management Commodity hardware Pay-as-you-go usage model Details of physical resources hidden Simple interfaces Volunteer computing Simple mechanism to access millions CPUs Difficult if (much) data involved Control of environment  check Community building – people involved in Science Potential for huge amounts of real work Grids: Could make a community-based request to EGI. Would probably need some contribution of resources from the community itself Supercomputers: Request time from DEISA/PRACE Clouds: Needs money but some companies (e.g. Amazon) make initial donations of free time well justified scientific challenges Volunteer computing: contact citizen cyberscience centre, make a request to google exacycle grant: http://research.google.com/university/exacycle_program.html

Collaboration with the General Public: Citizen Cyberscience Centre Philosophy: promote web-based citizen participation in science projects as an appropriate low cost technology for scientists in the developing world. Partners: CERN, UN Institute for Training and Research, University of Geneva Sponsors: IBM, HP Labs, Shuttleworth Foundation Technology: open source platforms for internet-based distributed collaboration Projects: Computing for Clean Water optimizing nanotube based water filters by large scale simulation on volunteer PCs AfricaMap volunteer thinking to generate maps of regions of Africa from satellite images, with UNOSAT LHC@home new volunteer project for public participation in LHC collision simulations, using VM technology Plans: Training workshops in 2011 in India, China, Brazil and South Africa Frédéric Hemmer

Some more questions to be answered Computing model How many computing models exist in the community and can they all use the same computing infrastructure? Continuous load or periodic campaigns? How intensely and frequently will the community use the computing infrastructure? Manpower Do you have enough geeks to port the code and support it? How committed is the community? Are you prepared to contribute and share computing resources? Computing model: Kauffman background note shows there are many theories about the origins of life – when these are implemented as computer applications do they have different computing models? Are they compatible or different? What are the characteristics? Are they suitable for grids or supercomputers? After speaking to Wim Hordijk from Lausanne at least one model does fit well to grid computing. Bob Jones – May 2011