Download presentation
Presentation is loading. Please wait.
Published byEleanore Sullivan Modified over 8 years ago
1
ALICE Software Evolution Predrag Buncic
2
GDB | September 11, 2013 | Predrag Buncic 2
3
3 AliEn + MonALISA AliRoot ROOT + XRootD
4
GDB | September 11, 2013 | Predrag Buncic 4 ROOT OO C++ framework for data analysis and visualization Persistent I/O for C++ objects with support for schema evolution All of ALICE data is in ROOT format For good (and bad) ALICE always used ROOT directly, no abstractions and interfaces Always following the latest developments
5
GDB | September 11, 2013 | Predrag Buncic 5 ROOT Geometry Package Andrei Gheata
6
GDB | September 11, 2013 | Predrag Buncic EVE - Event Display Matevz Tadel
7
GDB | September 11, 2013 | Predrag Buncic 7 XRootD High performance, scalable fault tolerant access to data repositories, fast, low latency protocol Organized as a hierarchical namespace Allows the deployment of data access clusters of virtually any size Supports sophisticated features, like authentication/authorization, integrations with other systems, WAN data distribution ALICE is using XRootD for LAN and WAN data access via ROOT and AliEn interfaces since 2007 Using ALICE token authorization plugin to access files on Grid SEs (A. Peters)
8
GDB | September 11, 2013 | Predrag Buncic 8 AliRoot ALICE Software Framework (written in C++) 15 years old and still growing Built on top of ROOT (and using all ROOT features) Supports our basic use cases Simulation Reconstruction Analysis But also calibration, alignment, visualization, QA… HLT reconstruction (different from offline) 3M SLOC, 180+ contributors
9
GDB | September 11, 2013 | Predrag Buncic 9 AliEn AliEn is ALICE Grid Environment P.Saiz at al. 12 years old and still strong 3-layer system that leverages the deployed resources of the underlying WLCG infrastructures and services including the local variations, such as EGI, NDGF and OSG Interfaces to AliRoot via ROOT plugin(TAliEn) that implements AliEn API
10
GDB | September 11, 2013 | Predrag Buncic 10 MonALISA Costin Grigoras
11
GDB | September 11, 2013 | Predrag Buncic 11 MonALISA 4M parameters monitored and archived in consolidated database for data mining Much more than monitoring.. Data presentation and visualization tool Complex Grid job Workflow engine Web UI (Workflows, Jobs, File Catalog..) Optimal file placement tool Scheduled file transfer framework Organized Analysis management and steering framework
12
GDB | September 11, 2013 | Predrag Buncic 12 Why changes? ALICE plans some serious detector upgrades Run2 (2015-2017) 4 fold increase in instant luminosity for Pb-Pb collisions consolidation of the readout electronics of TPC and TRD ( readout rate x 2) Run3 (2019-2021) Continuous readout TPC, ITS upgrade 50kHz Pb-Pb interaction rate (current rate x 100) 1.1 TB/s detector readout Needs online reconstruction in order to better compress data for storage
13
GDB | September 11, 2013 | Predrag Buncic 13 AliRoot We must adapt to changing environment and new technologies ROOT 6, C++11 Multi and many cores GPUs We must improve the performance New algorithms Memory issues I/O is critical, needs a fresh look on data model We must converge to use the same framework in Online and Offline environment
14
ALICE © | US Resource Review | 8-11 April 2013| Predrag Buncic14 C. ZAMPOLLI - OPEN FORUM PA Run 1 201 0 201 1 201 2 201 3 201 4 201 5 201 6 201 7 201 8 201 9 202 0 202 1 LS1Run 2LS2Run 3 202 2 202 3 2024-26 LS3Run 4 AliRoot 6.x AliRoot 5.x Evolution of current framework Based on Root 5.x Improved algorithms and procedures New modern framework Based on Root 6.x, C++11 Optimized for I/O FPGA, GPU, MIC… AliRoot
15
15 Run2::Simulation
16
16 Run2::Simulation Migrate from G3 to G4 G3 is not supported G4 is x2 slower for ALICE use case Need to work with G4 experts on performance Expect to profit from future G4 developments Multithreaded G4, G4 on GPU… Must work on fast (parameterized) simulation Basic support exists in the current framework Make more use of embedding, event mixing…
17
GDB | September 11, 2013 | Predrag Buncic 17 Run3::O2 From Detector Readout to Analysis, from DAQ, HLT to Offline: New computing framework (O2)
18
18 Data Reduction Reduce data from >1 TByte/s to ~80 GByte/s for storage Only possible with online reconstruction Local&Global reconstruction RORCs, FLPs and EPNs Detector specific algorithms ALICE © | HL-LHC | 17 July 2013| Predrag Buncic
19
Data Reduction 19 Data Format Data Reduction Factor Event Size (MByte) Raw Data1700 FEEZero Suppression3520 HLT Clustering & Compression5-7~3 Remove clusters not associated to relevant tracks 21.5 Data format optimization2-3<1 ALICE © | HL-LHC | 17 July 2013| Predrag Buncic
20
GDB | September 11, 2013 | Predrag Buncic Resource Estimates Estimate for online systems based on current HLT ~2500 cores distributed over 200 nodes 108 FPGAs for cluster finding ( 1 FPGA = 80 CPU cores) 64 GPGPUs for tracking (NVIDIA GTX480 + GTX580) Scaling to 50 kHz rate to estimate requirement ~ 250.000 today’s cores 1250-1500 HLT nodes in 2018 Additional processing power by FPGAs + GPGPUs Estimate for offline processing power 10 6 today’s cores required after upgrade Expected performance increase per node until 2018: factor 16 Additional gain by code optimization, use of online farm for reconstruction 20
21
AliEn CAF On Demand CAF On Demand AliEn O 2 Online-Offline Facility Private CERN Cloud Vision Public Cloud(s) AliEn HLT DAQ T1/T2/T3 Reconstruction Calibration Re-reconstruction Online Raw Data Store Analysis Reconstruction Custodial Data Store Analysis Simulation Data Cache Simulation 2018
22
GDB | September 11, 2013 | Predrag Buncic 22 Why change AliEn? While the system currently fulfills all the needs of ALICE users for reconstruction, simulation and analysis there are concerns about scalability of the file catalog beyond Run2 Need to address the use for emerging cloud, volunteer as well as the opportunistic resources for ALICE In general, no manpower for maintenance and continuous development of the current system Adopt common solutions and tools where exist for a given use case
23
GDB | September 11, 2013 | Predrag Buncic 23 CVMFS deployment for ALICE 35:17
24
24 CVMFS deployment 33 sites installed CVMFS, 19 pending. 1 running in production
25
GDB | September 11, 2013 | Predrag Buncic Timeline Predrag Buncic 25 Setup Stratum 0 for ALICE Deploy ALICE S/W on CVMFS Migrate ALICE Repository to Stratum 1s Test, test, test Start deployment process on all ALICE sites Deploy CVMFS repository but do not use it in production Run AliEn from CVMFS on selected site(s) Validate and evaluate stability, performance –Run AliEn from CVMFS on all site(s) Jan 2013 ……….... ………… Jun 2013 ………… July 2013 ………… Aug 2013 ………… Apr 2014
26
GDB | September 11, 2013 | Predrag Buncic 26 [pbuncic@localhost bin]$./parrot_run -p "http://cernvm.lbl.gov:3128;DIRECT" -r "alice.cern.ch:url=http://cernvm- devwebfs.cern.ch/cvmfs/alice.cern.ch" /bin/bash [pbuncic@localhost bin]$. /cvmfs/alice.cern.ch/etc/login.sh [pbuncic@localhost bin]$ time alienv setenv AliRoot/v5-03-Rev-19 -c aliroot -b ******************************************* * * * W E L C O M E to R O O T * * * * Version 5.34/05 14 February 2013 * * * * You are welcome to visit our Web site * * http://root.cern.ch * * * ******************************************* ROOT 5.34/05 (tags/v5-34-05@48582, Feb 15 2013, 17:08:24 on linuxx8664gcc) CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010 Type ? for help. Commands must be C++ statements. Enclose multiple statements between { }. Use of opportunistic resources
27
27 Opportunistic use of SC resources for MonteCarlo Titan has 18,688 nodes (4 nodes per blade, 24 blades per cabinet), each containing a 16-core AMD Opteron 6274 CPU with 32 GiB memory and an Nvidia Tesla K20X GPU with 6 GiB memory. There are a total of 299,008 processor cores, and a total of 693.6 TiB of CPU and GPU RAM16-coreAMD Opteron 6274
28
GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 28 AliEn CE AliEn CE Job Agent Job Agent Interfacing with PanDA (ATLAS) –Ongoing integration activity between CMS and ATLAS –BigPanDA – DoE ASCR and HEP funded project “Next Generation Workload Management and Analysis System for Big Data” –Aims at Leadership Class Facilities in US (supercomputers)
29
GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 29
30
GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 30
31
GDB | September 11, 2013 | Predrag Buncic Predrag Buncic 31
32
GDB | September 11, 2013 | Predrag Buncic Long Term Data Preservation. ALICE © | US Resource Review | 8-11 April 2013| Predrag Buncic 32
33
Clouds and virtualization strategy Use CernVM Family of tools CernVM, CVMFS… Enabling technology that opens a door for implementation of different use cases Use of HLT Farm for Offline processing Use of VM as a user interface On demand QA cluster for release validation Virtual Analysis Facilities Volunteer computing Long Term Data Preservation solution Predrag Buncic33
34
Conclusions ALICE s/w framework will evolve to satisfy Run2 needs Run3 will impose much higher requirements –100x more events to handle –Convergence of Online and Ofline –require complete change of s/w framework On the grid side we will start evolving to “grid on the cloud” model –Based on CernVM family of tools Looking for synergies with other experiments 34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.