ALICE Software Evolution Predrag Buncic. GDB | September 11, 2013 | Predrag Buncic 2.

Slides:



Advertisements
Similar presentations
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Quality Control B. von Haller 8th June 2015 CERN.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
ALICE O 2 Plenary | October 1 st, 2014 | Pierre Vande Vyvre O2 Project Status P. Buncic, T. Kollegger, Pierre Vande Vyvre 1.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting September 22, 2015.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
Predrag Buncic (CERN/PH-SFT) WP9 - Workshop Summary
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
ALICE Offline Week | CERN | November 7, 2013 | Predrag Buncic AliEn, Clouds and Supercomputers Predrag Buncic With minor adjustments by Maarten Litmaath.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Upgrade Letter of Intent High Level Trigger Thorsten Kollegger ALICE | Offline Week |
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
The MEG Offline Project General Architecture Offline Organization Responsibilities Milestones PSI 2/7/2004Corrado Gatto INFN.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Monitoring for the ALICE O 2 Project 11 February 2016.
ALICE Run 2 Readiness WLCG Collaboration Workshop Okinawa Apr 11, 2015 Maarten Litmaath CERN v1.2 1.
ALICE O 2 | 2015 | Pierre Vande Vyvre O 2 Project Pierre VANDE VYVRE.
Workshop ALICE Upgrade Overview Thorsten Kollegger for the ALICE Collaboration ALICE | Workshop |
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, Workshop Commissione Calcolo.
Some topics for discussion 31/03/2016 P. Hristov 1.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Predrag Buncic CERN Data management in Run3. Roles of Tiers in Run 3 Predrag Buncic 2 ALICEALICE ALICE Offline Week, 01/04/2016 Reconstruction Calibration.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
DAQ thoughts about upgrade 11/07/2012
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 24/05/2015.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
ALICE & Clouds GDB Meeting 15/01/2013
Use of HLT farm and Clouds in ALICE
ALICE experience with ROOT I/O
CMS High Level Trigger Configuration Management
evoluzione modello per Run3 LHC
Workshop Computing Models status and perspectives
ALICE analysis preservation
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Leigh Grundhoefer Indiana University
Computing at the HL-LHC
Offline framework for conditions data
Presentation transcript:

ALICE Software Evolution Predrag Buncic

GDB | September 11, 2013 | Predrag Buncic 2

3 AliEn + MonALISA AliRoot ROOT + XRootD

GDB | September 11, 2013 | Predrag Buncic 4 ROOT OO C++ framework for data analysis and visualization Persistent I/O for C++ objects with support for schema evolution All of ALICE data is in ROOT format For good (and bad) ALICE always used ROOT directly, no abstractions and interfaces Always following the latest developments

GDB | September 11, 2013 | Predrag Buncic 5 ROOT Geometry Package Andrei Gheata

GDB | September 11, 2013 | Predrag Buncic EVE - Event Display Matevz Tadel

GDB | September 11, 2013 | Predrag Buncic 7 XRootD High performance, scalable fault tolerant access to data repositories, fast, low latency protocol Organized as a hierarchical namespace Allows the deployment of data access clusters of virtually any size Supports sophisticated features, like authentication/authorization, integrations with other systems, WAN data distribution ALICE is using XRootD for LAN and WAN data access via ROOT and AliEn interfaces since 2007 Using ALICE token authorization plugin to access files on Grid SEs (A. Peters)

GDB | September 11, 2013 | Predrag Buncic 8 AliRoot ALICE Software Framework (written in C++) 15 years old and still growing Built on top of ROOT (and using all ROOT features) Supports our basic use cases Simulation Reconstruction Analysis But also calibration, alignment, visualization, QA… HLT reconstruction (different from offline) 3M SLOC, 180+ contributors

GDB | September 11, 2013 | Predrag Buncic 9 AliEn AliEn is ALICE Grid Environment P.Saiz at al. 12 years old and still strong 3-layer system that leverages the deployed resources of the underlying WLCG infrastructures and services including the local variations, such as EGI, NDGF and OSG Interfaces to AliRoot via ROOT plugin(TAliEn) that implements AliEn API

GDB | September 11, 2013 | Predrag Buncic 10 MonALISA Costin Grigoras

GDB | September 11, 2013 | Predrag Buncic 11 MonALISA 4M parameters monitored and archived in consolidated database for data mining Much more than monitoring.. Data presentation and visualization tool Complex Grid job Workflow engine Web UI (Workflows, Jobs, File Catalog..) Optimal file placement tool Scheduled file transfer framework Organized Analysis management and steering framework

GDB | September 11, 2013 | Predrag Buncic 12 Why changes? ALICE plans some serious detector upgrades Run2 ( ) 4 fold increase in instant luminosity for Pb-Pb collisions consolidation of the readout electronics of TPC and TRD ( readout rate x 2) Run3 ( ) Continuous readout TPC, ITS upgrade 50kHz Pb-Pb interaction rate (current rate x 100) 1.1 TB/s detector readout Needs online reconstruction in order to better compress data for storage

GDB | September 11, 2013 | Predrag Buncic 13 AliRoot We must adapt to changing environment and new technologies ROOT 6, C++11 Multi and many cores GPUs We must improve the performance New algorithms Memory issues I/O is critical, needs a fresh look on data model We must converge to use the same framework in Online and Offline environment

ALICE © | US Resource Review | 8-11 April 2013| Predrag Buncic14 C. ZAMPOLLI - OPEN FORUM PA Run LS1Run 2LS2Run LS3Run 4 AliRoot 6.x AliRoot 5.x Evolution of current framework Based on Root 5.x Improved algorithms and procedures New modern framework Based on Root 6.x, C++11 Optimized for I/O FPGA, GPU, MIC… AliRoot

15 Run2::Simulation

16 Run2::Simulation Migrate from G3 to G4 G3 is not supported G4 is x2 slower for ALICE use case Need to work with G4 experts on performance Expect to profit from future G4 developments Multithreaded G4, G4 on GPU… Must work on fast (parameterized) simulation Basic support exists in the current framework Make more use of embedding, event mixing…

GDB | September 11, 2013 | Predrag Buncic 17 Run3::O2 From Detector Readout to Analysis, from DAQ, HLT to Offline: New computing framework (O2)

18 Data Reduction Reduce data from >1 TByte/s to ~80 GByte/s for storage Only possible with online reconstruction Local&Global reconstruction RORCs, FLPs and EPNs Detector specific algorithms ALICE © | HL-LHC | 17 July 2013| Predrag Buncic

Data Reduction 19 Data Format Data Reduction Factor Event Size (MByte) Raw Data1700 FEEZero Suppression3520 HLT Clustering & Compression5-7~3 Remove clusters not associated to relevant tracks 21.5 Data format optimization2-3<1 ALICE © | HL-LHC | 17 July 2013| Predrag Buncic

GDB | September 11, 2013 | Predrag Buncic Resource Estimates Estimate for online systems based on current HLT ~2500 cores distributed over 200 nodes 108 FPGAs for cluster finding ( 1 FPGA = 80 CPU cores) 64 GPGPUs for tracking (NVIDIA GTX480 + GTX580) Scaling to 50 kHz rate to estimate requirement ~ today’s cores  HLT nodes in 2018 Additional processing power by FPGAs + GPGPUs Estimate for offline processing power 10 6 today’s cores required after upgrade Expected performance increase per node until 2018: factor 16 Additional gain by code optimization, use of online farm for reconstruction 20

AliEn CAF On Demand CAF On Demand AliEn O 2 Online-Offline Facility Private CERN Cloud Vision Public Cloud(s) AliEn HLT DAQ T1/T2/T3 Reconstruction Calibration Re-reconstruction Online Raw Data Store Analysis Reconstruction Custodial Data Store Analysis Simulation Data Cache Simulation 2018

GDB | September 11, 2013 | Predrag Buncic 22 Why change AliEn? While the system currently fulfills all the needs of ALICE users for reconstruction, simulation and analysis there are concerns about scalability of the file catalog beyond Run2 Need to address the use for emerging cloud, volunteer as well as the opportunistic resources for ALICE In general, no manpower for maintenance and continuous development of the current system Adopt common solutions and tools where exist for a given use case

GDB | September 11, 2013 | Predrag Buncic 23 CVMFS deployment for ALICE 35:17

24 CVMFS deployment 33 sites installed CVMFS, 19 pending. 1 running in production

GDB | September 11, 2013 | Predrag Buncic Timeline Predrag Buncic 25 Setup Stratum 0 for ALICE Deploy ALICE S/W on CVMFS Migrate ALICE Repository to Stratum 1s Test, test, test Start deployment process on all ALICE sites Deploy CVMFS repository but do not use it in production Run AliEn from CVMFS on selected site(s) Validate and evaluate stability, performance –Run AliEn from CVMFS on all site(s) Jan 2013 ……….... ………… Jun 2013 ………… July 2013 ………… Aug 2013 ………… Apr 2014

GDB | September 11, 2013 | Predrag Buncic 26 bin]$./parrot_run -p " -r "alice.cern.ch:url= devwebfs.cern.ch/cvmfs/alice.cern.ch" /bin/bash bin]$. /cvmfs/alice.cern.ch/etc/login.sh bin]$ time alienv setenv AliRoot/v5-03-Rev-19 -c aliroot -b ******************************************* * * * W E L C O M E to R O O T * * * * Version 5.34/05 14 February 2013 * * * * You are welcome to visit our Web site * * * * * ******************************************* ROOT 5.34/05 Feb , 17:08:24 on linuxx8664gcc) CINT/ROOT C/C++ Interpreter version , July 2, 2010 Type ? for help. Commands must be C++ statements. Enclose multiple statements between { }. Use of opportunistic resources

27 Opportunistic use of SC resources for MonteCarlo Titan has 18,688 nodes (4 nodes per blade, 24 blades per cabinet), each containing a 16-core AMD Opteron 6274 CPU with 32 GiB memory and an Nvidia Tesla K20X GPU with 6 GiB memory. There are a total of 299,008 processor cores, and a total of TiB of CPU and GPU RAM16-coreAMD Opteron 6274

GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 28 AliEn CE AliEn CE Job Agent Job Agent Interfacing with PanDA (ATLAS) –Ongoing integration activity between CMS and ATLAS –BigPanDA – DoE ASCR and HEP funded project “Next Generation Workload Management and Analysis System for Big Data” –Aims at Leadership Class Facilities in US (supercomputers)

GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 29

GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 30

GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 31

GDB | September 11, 2013 | Predrag Buncic Long Term Data Preservation. ALICE © | US Resource Review | 8-11 April 2013| Predrag Buncic 32

Clouds and virtualization strategy Use CernVM Family of tools CernVM, CVMFS… Enabling technology that opens a door for implementation of different use cases Use of HLT Farm for Offline processing Use of VM as a user interface On demand QA cluster for release validation Virtual Analysis Facilities Volunteer computing Long Term Data Preservation solution Predrag Buncic33

Conclusions ALICE s/w framework will evolve to satisfy Run2 needs Run3 will impose much higher requirements –100x more events to handle –Convergence of Online and Ofline –require complete change of s/w framework On the grid side we will start evolving to “grid on the cloud” model –Based on CernVM family of tools Looking for synergies with other experiments 34