The Challenges of Scientific Computing

Slides:



Advertisements
Similar presentations
How to form a consortium
Advertisements

A successful public- private partnership Alberto Di Meglio CERN openlab Deputy Head.
Accelerating Science and Innovation Welcome to CERN.
A successful public- private partnership Alberto Di Meglio CERN openlab CTO.
CERN an (unusual) introduction Rolf-Dieter Heuer Director-General, CERN India and CERN – Visions for Future Collaboration Mumbai 28 February 2011.
Welcome to CERN Accelerating Science and Innovation 2 nd March 2015 – Bidders Conference – DO-29161/EN.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Welcome to CERN Research Technology Training Collaborating.
CERN openlab IT Challenges Workshop Alberto Di Meglio CERN openlab CTO office.
Helge Meinhard CERN CERN Openlab Summer Student Lectures 2015 From Detectors to Publications.
Accelerating Science and Innovation. Science for Peace CERN was founded in 1954 as a Science for Peace Initiative by 12 European States Member States:
13 October 2014 Eric Grancher, head of database services, CERN IT Manuel Martin Marquez, data scientist, CERN openlab.
From Open Science to (Open) Innovation Markus Nordberg, Marzio Nessi (CERN) About CERN, Open Physics, Innovation and IdeaSquare.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
Welcome – Benvenuti Carlo Verdone to Accelerating Science and Innovation to Accelerating Science and Innovation.
User views from outside of Western Europe MarkoBonac, Arnes, Slovenia.
Rackspace Analyst Event Tim Bell
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
CERN openlab V Technical Strategy Fons Rademakers CERN openlab CTO.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
IRC: ICT Presentation Sept Opportunities through Collaboration Ing.Pierre Theuma Manager, IRC Malta.
CERN openlab a Model for Research, Innovation and Collaboration Alberto Di Meglio CERN openlab CTO DOI: /zenodo.8518.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Helge Meinhard CERN CERN Openlab Summer Student Lectures 2014 From Detectors to Publications.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
A successful public- private partnership Bob Jones head of CERN openlab.
Technical Workshop 5-6 November 2015 Alberto Di Meglio CERN openlab Head.
WelcomeWelcome CSEM – CERN Day 23 rd May 2013 CSEM – CERN Day 23 rd May 2013 to Accelerating Science and Innovation to Accelerating Science and Innovation.
CERN as a World Laboratory: From a European Organization to a global facility CERN openlab Board of Sponsors July 2, 2010 Rüdiger Voss CERN Physics Department.
© Enterprise Europe Network South West 2009 The Eurostars Programme Kenny Legg R&D Funding for the Environmental Sector – 29 June 2010 European Commission.
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
CERN openlab Overview CERN openlab Introduction Alberto Di Meglio.
Brocade Flow Optimizer CERN openlab
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
CERN IT Department CH-1211 Genève 23 Switzerland 3 rd Meeting of the GEANT Experts group Brussels, 10 March 2011 Frédéric Hemmer CERN IT.
Introduction to CERN and Grid Computing Dr. Wolfgang von Rüden CERN, Geneva HP ProCurve event CERN, 20 February 2008.
LHC Computing, CERN, & Federated Identities
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Frédéric Hemmer IT Department 26 th January 2010 Visit of Michael Dell 1 Frédéric.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
CERN openlab Overview CERN openlab Summer Students 2015 Fons Rademakers.
The Mission of CERN  Push back  Push back the frontiers of knowledge E.g. the secrets of the Big Bang …what was the matter like within the first moments.
25-September-2005 Manjit Dosanjh Welcome to CERN International Workshop on African Research & Education Networking September ITU, UNU and CERN.
Germany and CERN / June 2009Germany and CERN | May Welcome - Willkommen CERN: to CERN: Accelerating Science and Innovation Professor Wolfgang A.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
A successful public- private partnership Fons Rademakers CERN openlab CTO.
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
A successful public- private partnership Alberto Di Meglio CERN openlab Head.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
A successful public- private partnership Maria Girone CERN openlab CTO.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
CERN August 2013CERN Teacher Programmes1 Designing Effective Outreach Programmes for Teachers at CERN Inspiring the next generation of scientists and engineers.
5-minutes tour of CERN (based on official CERN slides) 5-minutes tour of CERN (based on official CERN slides) Christian Joram / CERN EIROfrum Topical Workshop.
A successful public-private partnership
European Organization for Nuclear Research
The 5 minutes tour of CERN The 5 minutes race of CERN
A successful public-private partnership
A successful public-private partnership
The 5 minutes tour of CERN The 5 minutes race of CERN
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
CERN Teacher Programmes
Vision for CERN IT Department
Perspective of an International Research Center
CERN: from fundamental sciences to daily applications
Presentation transcript:

The Challenges of Scientific Computing Alberto Di Meglio CERN openlab CTO DOI: 10.5281/zenodo.7116

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Outline What is CERN and How it Works Computing and Data Challenges in HEP New Requirements and Future Directions Scientific Collaborations Huawei IT Leaders Forum - 18 September 2013, Amsterdam

What is CERN and How Does it Work? Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam What is CERN? European Organization for Nuclear Research ~ 2300 staff ~ 1050 other paid personnel ~ 11000 users Budget (2012) ~1100 MCHF Founded in 1954 20 Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: Romania Associate Members in the Pre-Stage to Membership: Israel, Serbia, Cyprus Applicant States: Slovenia, Turkey Observers to Council: India, Japan, the Russian Federation, the United States of America, Turkey, the European Commission and UNESCO Huawei IT Leaders Forum - 18 September 2013, Amsterdam

What is the Universe made of? What gives the particles their masses? How can gravity be integrated into a unified theory? Why is there only matter and no anti-matter in the universe? Are there more space-time dimensions than the 4 we know of? What is dark energy and dark matter which makes up 95% of the universe ? Huawei IT Leaders Forum - 18 September 2013, Amsterdam

The Large Hadron Collider (LHC) Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam LHC Facts Biggest accelerator (largest machine) in the world Fastest racetrack on Earth Protons circulate at 99.9999991% the speed of light Emptiest place in the solar system Pressure 10-13 atm (10x less than on the moon) World’s largest refrigerator -271.3 °C (1.9K) Hottest spot in the galaxy temperatures 100 000x hotter than the heart of the sun 5.5 Trillion K World’s biggest and most sophisticated detectors Most data of any scientific experiment 20-30 PB per year (as of today we have about 75 PB) Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Collisions in the LHC Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Computing and Data Challenges in HEP Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam The LHC Challenges Signal/Noise: 10-13 (10-9 offline) Data volume High rate * large number of channels * 4 experiments  ~30 PB of new data each year Compute power Event complexity * Nb. events * thousands users  300 k CPUs  170 PB of disk storage Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere ~1.5M jobs/day, 150k CPU-years/year GRID technology Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Data Handling and Computation Online Triggers and Filters Offline Reconstruction Selection & reconstruction Processed data (active tapes) 10% (event summary) Offline Analysis Batch Physics Analysis 1% 100% (raw data, 6 GB/s) Event reprocessing Offiine Simulation Event simulation Interactive Analysis Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam What is this data? Raw data Was a detector element hit? How much energy? What time? Simulated data Simulate particle collisions (hard interaction) according to a candidate theory Simulate interaction of primary and secondary particles with detector material Outcome: detailed response of the detector to a known “event” Reconstructed data (derived from both of the above) Momentum of tracks (4-vectors) Origin Energy in clusters (jets) Particle type Calibration information Highest data complexity is here… Analysis data (derived from RECO data) User or group specific data abstractions or selections Science happens here… Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Data Storage CASTOR and EOS Storage Systems developed at CERN CASTOR and EOS are using the same commodity disk servers With RAID-1 for CASTOR (2 copies in the mirror) JBOD with RAIN for EOS Replicas spread over different disk servers Tunable redundancy Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam The Grid Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis Huawei IT Leaders Forum - 18 September 2013, Amsterdam

New Requirements and Future Directions (Big Data and Data Analytics) Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam LHC Schedule 2009 2010 2011 2011 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 … 2030? First run LS1 Second run LS2 Third run LS3 HL-LHC Phase-0 Upgrade (design energy, nominal luminosity) Phase-1 Upgrade (design energy, design luminosity) Phase-2 Upgrade (High Luminosity) LHC startup 900 GeV 7 TeV L=6x1033 cm-2s-2 Bunch spacing = 50 ns 14 TeV L=1x1034 cm-2s-2 Bunch spacing = 25 ns 14 TeV L=2x1034 cm-2s-2 Bunch spacing = 25 ns 14 TeV L=1x1035 cm-2s-2 Spacing = 12.5 ns Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Online DAQ and Triggers Most events produced in the detectors describe known physics Only 1 in 1013 events are considered interesting now, but discarded raw data is lost forever Filtering is done using two-level triggers, the first in HW (L1), the second in SW (HLT) L1 triggers use complex custom processors, difficult to program, maintain and reconfigure LS1, LS2 and LS3 require higher and higher data rates Can the L1 run at the same rate as LHC? Implement L1 in software Replace HW triggers with commodity processors/co-processors (GPGPUs/XeonPhis?)  easier to program and upgrade/maintain Long-term, merge L1 and HLT, need lots of fast, low-power, low-cost links DAQ evolution multi Tbit transfers Close integration of network and CPUs Rackscale architecture? Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Multi-Core Platforms Very high-level of parallelism, both online and offline Parallelism exploited sending multiple independent computational jobs on separate physical or virtual nodes Multiple physical cores partitioned across virtual machines Current software not written to exploit multi-cores Very important issue with detector simulations, very CPU intensive tasks Exploit multi-core platforms using vectorization techniques Build experience with data and task parallelism using Cilk and Haswell (CilkPlus and GCC) Requires rewriting the software  Geant 5 Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Cloud Infrastructure New CERN Computer Centre Two 100GB lines, one academic, one commercial Additional capacity, redundancy Remote management, dynamic provisioning No increase in staff, decrease in budget Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Cloud Storage Data storage is critical Currently 75/100 PB tapes, 25/40 PB disk, getting close to max capacity Need to be able to scale both predictably (planning) and dynamically (peak times) Can industrial cloud storage deliver the required quality of service maximizing Reliability, Performance and Cost at the same time? Partership CERN/IT & Huawei (S3 Storage Appliance, 0.8 PB, 1 year) Shared areas of investigations Reliability (100% despite disk failures) Functional tests on Amazon S3 protocol and interoperability with Grid production environments (tested ok) Performance measurements with sparse data read (comparable to existing LHC production systems) Total Cost of Ownership (TCO): clear model, no hidden costs found References used for comparison Open Source self assembled solutions (based on openstack / swift) Traditional Intel-based file server based on Linux operating system and locally attached disks Slow (latency) Expensive Unreliable Tapes Disks Flash, Solid State Disks Premium filers, SANs, … Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Data Analytics Massive amounts of data from The LHC data processing But also The LHC subsystems Magnets Cryogenics Control systems Electrical systems Logging data and alerts How can the data be analyzed efficiently to find patterns or detect problems at early stages? Investigations of Physics data analysis in databases exploiting columnar DBs (Oracle Exadata) Hadoop and MapReduce techniques Physics event indexing, filtering and searching Machine Learning techniques Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Scientific Collaborations Huawei IT Leaders Forum - 18 September 2013, Amsterdam

International Scientific Collaborations Many scientific projects are global collaborations of 100s of partners Efficient computing and data infrastructures have become critical as the quantity, variety and rates of data generation keep increasing Funding does not scale in the same way Optimization and sharing of resources Collaboration with commercial IT companies increasingly important Requirements are not unique anymore CERN Biomedical Facility Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Scientific Computing as a Service An on-demand computing and data analysis service Unique IDs and formal relationships among digital objects (publications, datasets, software, people identities) Reproducibility and preservation Results attribution and recognition Researchers Applications Discovery Software Repos and Registries Datasets Scientific results Publications Patents Reproducibility Preservation Authorship Citations on-demand SCaaS “Click ’n’ cite” Registration Developers Huawei IT Leaders Forum - 18 September 2013, Amsterdam

CERN openlab in a nutshell A science – industry partnership to drive R&D and innovation with over a decade of success Evaluate state-of-the-art technologies in a challenging environment and improve them Test in a research environment today what will be used in many business sectors tomorrow Train next generation of engineers/employees Disseminate results and outreach to new audiences Huawei IT Leaders Forum - 18 September 2013, Amsterdam

CERN openlab recipe: key ingredients The extreme demands from CERN’s scientific programme The alignment of goals between partners Trust Young researchers with their talents, expertise and energy Efficient and lightweight structure Regular checkpoints/reviews Outreach/communications/training Huawei IT Leaders Forum - 18 September 2013, Amsterdam

ICE-DIP: Intel CERN European Doctorate Industrial Program The project starts today: CERN will recruit the 5 PhD students on 3 year fellowship contracts in autumn 2013 Each PhD student will be seconded to Intel for 18 months Will work with LHC experiments on future upgrade research themes: usage of many-core processors for data acquisition future optical interconnect technologies reconfigurable logic data acquisition networks Associate partners: Nat. Univ. Ireland Maynooth & Dublin City Univ. (recruits will be enrolled in PhD programmes), Xena Networks (SME, Denmark) EC funding: ~ €1.25 million over 4 years Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam Conclusions CERN and the LHC program have been among the first to address “big data” challenges Solutions have been developed and important results obtained However, not unique anymore in both scientific and consumer applications Need to exploit emerging technologies and share expertise with academia and commercial partners LHC schedule will keep it at the bleeding edge of technology, providing excellent opportunities to companies to test ideas and technologies ahead of the market Huawei IT Leaders Forum - 18 September 2013, Amsterdam

Huawei IT Leaders Forum - 18 September 2013, Amsterdam This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. It includes photos, models and videos courtesy of CERN and uses contents provided by CERN and CERN openlab staff Huawei IT Leaders Forum - 18 September 2013, Amsterdam