High Performance Computing & Society Ian Bird, CERN | 28-29th September 2013
High Performance Computing What is it? Why does CERN need it? What use is it in the real world? ? September 12, 2013
High Performance Computing What is it? ? Why does CERN need it? What use is it in the real world?
September 2013 What is High Performance Computing? HUGE VOLUMES OF DATA SPEED FAST NETWORK SCALE LARGE SCALE PROCESS COMPLEX SOPHISTICATED SOFTWARE COMPLEX PROBLEMS THAT REQUIRE
September 2013 What is High Performance Computing? BIG DATA Data volumes that present a challenge – different for different sciences SUPERCOMPUTERS Very large, fast, (expensive!), world-class machines: hundreds of thousands of interconnected processors GRIDS Supercomputers built using commodity components, distributed globally CLOUD COMPUTING Large, centralised data centres, providing computing and services over the network VOLUNTEER COMPUTING Huge scale using voluntarily contributed PCs
High Performance Computing What is it? Why does CERN need it? What use is it in the real world? ? September 12, 2013
September IBM 370/ CDC First computer Ferranti Mercury Already lots of data CERN CC 1988 Cray X/MP 1998 CC
Computing in High-Energy Physics Demanding scienceDemanding computing Large scale computing & storage Innovation The Web Grid computing (LHC Computing Grid) September 12, 2013
CERN Computer Centre CERN COMPUTER CENTRE Built in the 70s on the CERN site (Meyrin-Geneva) ~3000 m2 (3 main machine rooms) 3.5 MW for equipment Est. PUE ~ 1.6 NEW EXTENSION Located at Wigner (Budapest) ~1000 m2 2.7 MW for equipment Connected to the Geneva CC with 2x100Gb links (21 and 24 ms RTT) September 2013
High Performance Computing What is it? Why does CERN need it? What use is it in the real world? ? September 12, 2013
LHCb Atlas Tools: LHC and Detectors Exploration of a new energy frontier in P-P and Pb-Pb collisions LHC ring: 27 km circumference CMS Alice
MB/sec Data flow to permanent storage: 4-6 GB/sec ~ 4 GB/sec 1-2 GB/sec
LHC Data LHC experiments generate 25 PB of data per year between them CERN scientific data archive is today 100 PB Requires huge amounts of processing power, storage, and network bandwidth September 2013
Just how much is that? LHC would need 30 million iPads to process all its data Stores 640 of these LHC would need 2 million iPads to store all the data LHC Data: 100 Petabytes (100 million Gigabytes) 21 million DVDs 125 million CDs 10 billion MP3 songs September 12, 2013
The Worldwide LHC Computing Grid Tier-1 permanent storage, re-processing, analysis Tier-0 (CERN) data recording, reconstruction and distribution Tier-2 Simulation, end-user analysis > 2 million jobs/day ~350’000 cores 200 PB of storage nearly 160 sites, 35 countries 10 Gb links WLCG An International collaboration to distribute and analyse LHC data Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists September 12, 2013
September 12, 2013
Application of High Performance Computing High Performance computing is used across very many areas of science, engineering, medicine, … Addressing problems that can only be tackled by large scale simulations and computations, or manipulation of massive data sets September 12, 2013
Such as…. Materials and engineering New materials Simulation in place of physical models September 12, 2013 Science and … Energy and Modelling of wind and wave energy Health and Gene sequencing Drug discovery Disease diagnostics Modelling the brain and organs Environment and Weather and climate modelling and prediction Earth observation
September 12, 2013
Genomics Human Genome project 1989 – 2000: sequencing the Human Genome 1 individual Today Same data volume generated in 3 minutes in a current large scale centre Now practical to sequence entire populations of humans, other animals Only possible with High Performance computing and storage September 2013
3 big areas of impact for medicine Germ line Risk to disease “Precision” cancer medicine Pathogens + Hospital acquired infections September 2013
Germ Line impact Everyone has differential risk of disease But the shift in risk is small Perhaps 1 to 2% have a striking change in risk to a serious disease (>10 fold) which is “actionable” This goes up to 3-4% if you count some less clinically worrying diseases 1:500 people have HCM 1:500 people have FH September 2013
Precision cancer diagnosis Cancer is a genomic disease By sequencing a cancer you can understand its molecular form better Particular molecular forms respond to particular drugs September 2013
Pathogens Sequencing provides a clear cut diagnosis of pathogens Can also be used to sequence environments (eg, hospitals) Immune systems for hospitals
Designing better antibiotics This example: the antibiotic does not distinguish between the fungal and human cell membranes Molecular dynamics models (describe behaviour of atoms, molecules, and interactions) used to determine better varieties of drugs that interact with disease rather than human cells Such calculations require large scale computing – in this case using a grid Lung cells attacked by a fungal cryptococcosis infection (Image: CDC / wikicommons) 3-D model of the Amphotericin B molecule (Image: wikicommons) September 2013
September 2013
September 2013 New materials The Simulation of nanostructured materials requires high performance computers and modern calculation methods, for example density functional theory and molecular dynamics. Complementing experiment and theory, simulations help to understand observed phenomena and even predict properties and scenarios in complex systems. Especially challenging is the quest for new materials and phenomena for which an experimental exploration without the knowledge from simulations would be prohibitive.
Pollution? Today’s plastics are a serious problem and hazard Polyactide plastics are an alternative, but expensive to produce – do not use oil, but cheap raw materials Such PLA plastics production needs a catalyst for the reaction, and this must be cheap and non-toxic Molecular simulation on a computing grid has been used to calculate the entire reaction mechanism September 2013
September 2013
Meteorological input data September 2013
September 2013 Weather Satellites METEOSAT NOAA POES FENGYUN 3 FENGYUN 1D MODIS TERRA MODIS AQUA JPSS1 SUOMI NPP Higher resolution Color Visible channel by night METOP (EPS)
September 2013 Numerical weather prediction Observations Weather forecasts Product Generation now+1 hour +2 hours +3 hours+4 hours Products …
September 2013 Climate and weather forecasts Future Past Decade forecasts Seasonal forecasts Monthly forecasts 100 years 10 years 1 year 1 month today Climate monitoring Medium range forecasts ( hours) Short range forecasts (12-72 hours) Shortest range forecasts (2-12 hours) Nowcasting ( < 2 hours) Climate projections
DWD NWP model suite September 2013 Model output size per day ~2.5 TBytes ICON: Grid spacing: 20 km * 60 grid points Forecast range: 174 hours Runs per day: 2 COSMO-EU: Grid spacing: 7 km 665 * 657 * 40 grid points Forecast range: 78 hours Runs per day: 4 COSMO-DE: Grid spacing: 2.8 km 421 * 461 * 50 grid points Forecast range: 21 hours Runs per day: 8 ICON x = 20 km COSMO-EU x = 7 km COSMO-DE x = 2.8 km
short range forecasts aviation briefings annually warnings annually expertises p.a. Data + model products Public weather service: basic warnings and forecasts, climate information Deliver the right data to the right people Efficient storage and access in time New analysis tasks: earlier storm tracking better climate analysis optimization problems in aviation and energy Privacy and security issues September 2013 What are the challenges?
short range forecasts aviation briefings annually warnings annually expertises p.a. Data + model products Public weather service basic warnings and forecasts, climate information September 2013
Environmental Modeling Bulgarian researchers have ported three applications to the grid. 1. Study the impact of climate change on air quality 2. Model atmospheric composition 3. Investigate emergency responses to the release of harmful substances into the atmosphere September 2013
Environmental Modeling Bulgarian researchers have ported three applications to the grid. 1. Study the impact of climate change on air quality 2. Model atmospheric composition 3. Investigate emergency responses to the release of harmful substances into the atmosphere September 2013
Use Case: ASTRA Ancient instruments Sound/Timbre Reconstruction Application Has recreated 4 instruments so far Held concerts using these instruments September 2013
Some advantages of using the grid: can meet high demand for network and computing requirements; high reliability; allow multi-disciplinary collaboration between researchers, musicians and historians; longevity: ASTRA running since September 2013 Use Case: ASTRA
Use Case: ITER Investigating viability of fusion as a power source Modelling and simulating the reactor Used 1 million CPU hours in the last 12 months September 2013
Use Case: DECIDE Diagnostic Enhancement of Confidence by an International Distributed Environment Diagnostic tools for the medical community Example: Their Statistical Parametric Mapping application can help doctors to diagnose Alzheimer’s disease in its early stages and track the progress of the symptoms over time September
Use Case: DECIDE Some advantages of using the grid: a single European-wide master database of images stored on the grid for doctors to use; can set up diagnostic tools with a dedicated grid infrastructure; customisable: dedicated software to track progression of the disease over time; sharing medical data securely. September 2013
Summary High Performance Computing – in all of its forms – is a vital tool in many areas of our everyday lives CERN, and other sciences, by pushing the boundaries of what is possible in computing helps to drive this forward September 2013