Intel Code Modernisation Project: Status and Plans

Slides:

Advertisements

Similar presentations

DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

Advertisements

IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

ALFA: The new ALICE-FAIR software framework

Status and roadmap of the AlFa Framework Mohammad Al-Turany GSI-IT/CERN-PH-AIP.

5 th LHCb Computing Workshop, May 19 th 2015 Niko Neufeld, CERN/PH-Department

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

Niko Neufeld, CERN/PH-Department

Computer Graphics Graphics Hardware

Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.

Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Detector Simulation on Modern Processors Vectorization of Physics Models Philippe Canal, Soon Yung Jun (FNAL) John Apostolakis, Mihaly Novak, Sandro Wenzel.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

Welcome and Introduction P. Mato, CERN.  Outcome of the FNAL workshop ◦ Interest for common effort to make rapid progress on exploratory R&D activities.

Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.

ATLAS Meeting CERN, 17 October 2011 P. Mato, CERN.

The High Performance Simulation Project Status and short term plans 17 th April 2013 Federico Carminati.

Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.

A parallel High Level Trigger benchmark (using multithreading and/or SSE)‏ Håvard Bjerke.

Update on G5 prototype Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012.

Ian Bird CERN, 17 th July 2013 July 17, 2013

Preliminary Ideas for a New Project Proposal.  Motivation  Vision  More details  Impact for Geant4  Project and Timeline P. Mato/CERN 2.

Andrei Gheata (CERN) for the GeantV development team G.Amadio (UNESP), A.Ananya (CERN), J.Apostolakis (CERN), A.Arora (CERN), M.Bandieramonte (CERN), A.Bhattacharyya.

CWG13: Ideas and discussion about the online part of the prototype P. Hristov, 11/04/2014.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Marco Haag - Institute of Experimental Nuclear.

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

Follow-up to SFT Review (2009/2010) Priorities and Organization for 2011 and 2012.

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

GeantV – status and plan A. Gheata for the GeantV team.

GeantV fast simulation ideas and perspectives Andrei Gheata for the GeantV collaboration CERN, May 25-26, 2016.

Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.

16 th Geant4 Collaboration Meeting SLAC, September 2011 P. Mato, CERN.

Scheduling fine grain workloads in GeantV A.Gheata Geant4 21 st Collaboration Meeting Ferrara, Italy September

GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.

Parallel Programming Models

IPHC, Strasbourg / GSI, Darmstadt

Computer Graphics Graphics Hardware

The Post Windows Operating System

VisIt Project Overview

Tom Van Steenkiste Supervisor: Predrag Buncic

Early Results of Deep Learning on the Stampede2 Supercomputer

Ian Bird WLCG Workshop San Francisco, 8th October 2016

Computing models, facilities, distributed computing

A successful public-private partnership

SuperB and its computing requirements

Geant4 MT Performance Soon Yung Jun (Fermilab)

Fast Parallel Event Reconstruction

FUTURE ICT CHALLENGES IN SCIENTIFIC COMPUTING

for the Offline and Computing groups

Kilohertz Decision Making on Petabytes

CPU Benchmarks Parallel Session Summary

A task-based implementation for GeantV

GeantV – Parallelism, transport structure and overall performance

CLARA Based Application Vertical Elasticity

Ray-Cast Rendering in VTK-m

New strategies of the LHC experiments to meet

Early Results of Deep Learning on the Stampede2 Supercomputer

Linear Collider Simulation Tools

Low Level HLT Reconstruction Software for the CMS SST

Computer Graphics Graphics Hardware

Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing

Computing at the HL-LHC

Linear Collider Simulation Tools

Planning next release of GAUDI

Presentation transcript:

Intel Code Modernisation Project: Status and Plans Openlab Technical workshop Federico Carminati December 9, 2016

Estimates of resource needs for HL-LHC Data: Raw 2016: 50 PB  2027: 600 PB Derived (1 copy): 2016: 80 PB  2027: 900 PB CPU: x60 from 2016 Technology at ~20%/year will bring x6-10 in 10-11 years Simple model based on today’s computing models, but with expected HL-LHC operating parameters (pile-up, trigger rates, etc.) At least x10 above what is realistic to expect from technology with reasonably constant cost Ian Bird

Motivations (even if you are familiar with them) The above is true only if we are here transistors clock power ILP 10-1 1 10 102 103 104 105 106 70 75 80 85 90 95 00 05 107 } We are now probably here We used to be here

The ALFA project ALICE-FAIR software project aiming to massive data volume reduction by (partial) online reconstruction and compression. Lets work together

ALFA Framework A data-flow based model that delivers: Transport layer (FairMQ, based on: ZeroMQ, nanomsg) Configuration tools Management and monitoring tools Provide unified access to configuration parameters and databases

FairRoot / ALFA https://fairroot.gsi.de Find the correct balance between reliability and performance DDS (Dynamic Deployment System) A toolset that automates and simplifies the deployment of user-defined sets of processes and their dependencies on any available resources Each "Task" is a separate process, which: Can be multithreaded, SIMDized, etc. runs on different hardware (CPU, GPGPU, XeonPhi, etc.) Be written in an any supported language (Bindings for 30+ languages) http://dds.gsi.de Different topologies of tasks can be adapted to the problem itself, and the hardware capabilities

Xeon Phi Support in ALFA/FairRoot Ongoing work: Make the dynamic deployment system (DDS) Xeon Phi aware. Extending the topology properties in DDS to support Xeon Phi

ALFA code modernization on KNL We just get access to KNL last month. (thanks to Fons and Omar) Work is ongoing to support the ICC 17.0.0

BioDynaMo — The Biology Dynamic Modeller Platform for high-performance computer simulations of biological dynamics Involves detailed physical interactions in biological tissue Highly optimised and parallelised code To be run both on HPC and Cloud environments Cortical column: 10k neurons - brain cancer (multi-core) Cortical sheet: 10m neurons - epilepsy (HPC) Cortex: 100m - 10bn neurons - schizophrenia (HPC on Cloud?)

From Cx3D to BioDynaMo Original Cx3D code in Java (20 kLOC) Ported to C++ Scalar, serial optimisations Vectorisation Parallelisation Co-processor and GPU optimisations ROOT for I/O and graphics

An IPCC to modernize the ROOT Math and I/O libraries Principal Investigator: Peter Elmer Princeton University The goal of ROOT IPCC is to modernize and optimize certain critical libraries in the ROOT software framework for multicore and many-core CPU architectures. ROOT is ubiquitous in particle and nuclear physics and used by 20000 scientists in 170 computer centers around the world. The focus of this IPCC is the optimization of the ROOT Math and I/O libraries. The work will prepare the way for CERN's planned upgrades to the Large Hadron Collider which will take particle physics through the 2030s.

Current status: the project was funded 6 months ago. There were some delays in hiring the person, but we have converged very recently. We are hiring Vassil Vassilev at Princeton to start on the IPCC project from January, 2017. Vassil is well known to the ROOT team via his work on the Cling interpreter. He will continue to be based at CERN to facilitate collaboration. The proposal from earlier this year focused specifically on two categories of work: parallelization of ROOT I/O Output (efficient data gather) and vectorization/parallelization of the Math libarie s (matrixes, "matriplex" to group together large numbers of small matrices, etc.)

The ROOT IPCC will collaborate with: - the ROOT developers at CERN and FNAL - the LHC experiments - a US NSF-funded project DIANA/HEP (http://diana-hep.org/) which is working on performance and interoperability of ROOT - a US NSF-funded R&D project developing charged parallel tracking algorithms (http://trackreco.github.io/) - the other HEP-related IPCC projects At the beginning of January, we will be making a combined work plan with the ROOT team (CERN, FNAL), the DIANA project, the TRACKRECO project and the other IPCC projects, taking into account the current status of the development.

Yellow lines are the trajectories of charged particles Yellow lines are the trajectories of charged particles. These are reconstructed from individual sensor signals using a iterative technique based on a Kalman Filter.

GeantV – accelerating detector simulation Detailed simulation of subatomic particles in detectors, essential for data analysis The LHC uses more than 50% of its distributed GRID power for simulations A vector-oriented approach for harnessing new computing technology

Basket processing benchmarks on KNL Presented at ISC16 Simplified example (concentric cylinder tracker) in “basket mode compared to classical approach (ROOT geometry) SIMD vectorization enforced by API, (UME:SIMD backend for AVX512) Scalability comparable KNC vs. KNL for the ideal and basket versions (~100x) GeantV approach gives excellent benefits with respect to the classical one (ROOT geometry) Intel® Xeon Phi™ CPU 7210 @ 1.30GHz, 64 cores Basket1 Basket2 Basket3 Basket4 Basket5 Basket0 TopNavigator LayerNavigator<0> LayerNavigator<1> LayerNavigator<N> InnermostLayerNavigator

Multi-propagator performance test Presented at SC16 Good scalability up to the number of physical cores Simplified calorimeter Tabulated physics (Electro-Magnetic processes + various materials) Full track transport and basketization procedure Scalability gets better by increasing number of propagators Not final results, still fixing/optimizing TAVX2/TAVX-512 ~ 1.9 ! Handling clusters of threads (NUMA aware in future) with weak inter-communication

GeantV & TBB Important for connecting with experiment parallel frameworks A task-oriented version of the static threads approach for GeantV First version to be improved, but performance already not bad Intel® Xeon Phi™ CPU 7210 @ 1.30GHz, 64 cores

The full prototype Exercise at the scale of LHC experiments (CMS) Full geometry converted to VecGeom + uniform magnetic field Tabulated physics, fixed 1MeV energy threshold Full track transport and basketization procedure First results on scalability (comparison to classical approach single-thread)

GEANTV POW 11/17/16