Computing Strategy of the AMS-02 Experiment V. Choutko 1, B. Shan 2 A. Egorov 1, A. Eline 1 1 MIT, USA 2 Beihang University, China CHEP 2015, Okinawa.

Slides:

Advertisements

Similar presentations

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

Advertisements

Computing Strategy of the AMS-02 Experiment B. Shan 1 On behalf of AMS Collaboration 1 Beihang University, China CHEP 2015, Okinawa.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.

Test results Test definition (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma; (2) Istituto Nazionale di Fisica Nucleare, Sezione di Bologna.

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.

Online Data Challenges David Lawrence, JLab Feb. 20, /20/14Online Data Challenges.

ITEC224 Database Programming

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.

OO Software and Data Handling in AMS Computing in High Energy and Nuclear Physics Beijing, September 3-7, 2001 Vitali Choutko, Alexei Klimentov MIT, ETHZ.

Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.

Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.

INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

A Grid fusion code for the Drift Kinetic Equation solver A.J. Rubio-Montero, E. Montes, M.Rodríguez, F.Castejón, R.Mayo CIEMAT. Avda Complutense, 22. Madrid.

LHCb and DataGRID - the workplan for 2001 Eric van Herwijnen Wednesday, 28 march 2001.

Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.

Real data reconstruction A. De Caro (University and INFN of Salerno) CERN Building 29, December 9th, 2009ALICE TOF General meeting.

Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

The ALICE short-term use case DataGrid WP6 Meeting Milano, 11 Dec 2000Piergiorgio Cerello 1 Physics Performance Report (PPR) production starting in Feb2001.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

S. Guatelli, A. Mantero, J. Moscicki, M. G. Pia Geant4 medical simulations in a distributed computing environment 4th Workshop on Geant4 Bio-medical Developments.

LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

Michele de Gruttola 2008 Report: Online to Offline tool for non event data data transferring using database.

LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Preliminary Ideas for a New Project Proposal.  Motivation  Vision  More details  Impact for Geant4  Project and Timeline P. Mato/CERN 2.

Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.

Computing Strategy of the AMS-02 Experiment

Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

1 DAQ.IHEP Beijing, CAS.CHINA mail to: The Readout In BESIII DAQ Framework The BESIII DAQ system consists of the readout subsystem, the.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Virtualisation: status and plans Dag Toppe Larsen

Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,

Compute and Storage For the Farm at Jlab

Slow Control and Run Initialization Byte-wise Environment

Slow Control and Run Initialization Byte-wise Environment

Evolution of Monitoring System for AMS Science Operation Center

Kilohertz Decision Making on Petabytes

Calibrating ALICE.

OpenGATE meeting/Grid tutorial, mars 9nd 2005

Virtualization in the gLite Grid Middleware software process

Support for ”interactive batch”

Simulation in a Distributed Computing Environment

Hybrid Programming with OpenMP and MPI

Gridifying the LHCb Monte Carlo production system

ATLAS DC2 & Continuous production

Status and plans for bookkeeping system and production tools

Production Manager Tools (New Architecture)

Introduction to research computing using Condor

Presentation transcript:

Computing Strategy of the AMS-02 Experiment V. Choutko 1, B. Shan 2 A. Egorov 1, A. Eline 1 1 MIT, USA 2 Beihang University, China CHEP 2015, Okinawa

TRD TOF Tracker TOF RICH ECAL AMS Experiment The Alpha Magnetic Spectrometer (AMS) is a high energy physics experiment on board the ISS, featured: – Geometrical Acceptance: 0.5 m 2  sr – Number of ReadOut Channels: ≈200K – Main payload of Space Shuttle Endeavour’s last launch (May 16, 2011) – Installed on ISS on May 19, 2011 – 7x24 running – Up to now, over 60 billion events collected – Max. event rate: 2kHz 2

AMS Data Flow DATABASE ORACLE(PDBR) Web Server Production Server AMS Cluster (PBS) CERN lxbatch (LSF) Production platform Remote Computing Facilities Validator Uploader EOS, CASTOR Frames RAW Preproduction (Producer) ROOT Remote Centers CERN Facilities Data Meta-data / Control Data transferred via relay satellites to Marshall Space Flight Center, then to CERN, nearly real-time, in form of one-minute-frame Frames are then built to runs (RAW): 1 run = ¼ orbit (~23 minutes) RAW files are reconstructed to ROOT files

AMS Flight Data Reconstruction First Production – Runs 7x24 on freshly arrived data – Initial data validation and indexing – Produces Data Summary Files and Event Tags (ROOT) for fast events selection – Usually be available within 2 hours after flight data arrived – Used to produce various calibrations for the second production as well as quick performance evaluation Second Production – To use all the available calibrations, alignments, ancillary data from the ISS, and slow control data to produce physics analysis ready set of reconstructed data 4 RAW ROOT Calibrations/ alignments/S lowControlD B ROOT (Ready for physics analysis)

Parallelization of Reconstruction Motivations – To shorten the (elapsed) time of data reconstruction – To increase productivity by using multi-core/hyper-thread enabled processors – To ease the production jobs management Tool: OPENMP – No explicit thread programming – Nearly no code change except few pragmas 5

Parallel Processing 6 Initialization Events Processing Read #critical StaticVars #threadprivate Events Processing Read #critical StaticVars #threadprivate Termination Output Event Map Output Event Map Write #critical Write #critical Histogram filling, large memory routines #atomic Thread synchroniz ation on database read #barrier

Parallel Reconstruction Performance (Xeon E v3 2.30GHz) 7

Parallel Reconstruction Performance (Intel MIC, no software change) 8

Parallelization of Simulation Software Motivations – To decrease required memory amount per simulation job – To shorten elapsed running time so as to allow faster Monte-Carlo parameter/software tuning Minimal software change: Original Geant MT model + OPENMP (ICC15) – Thread ID, thread number, and core number are taken from Geant4.10.1, instead of from OPENMP – Barrier from OPENMP does not work, so barriers are implemented – Other OPENMP pragmas work (minimal changes to AMS software) – Co-exists 9

Parallel Simulation Performance (Xeon E v3 2.30GHz) 10

Parallel Simulation Performance (Intel MIC, no software change) 11

Parallelization of Simulation Software Memory optimization – G4AllocatorPool modification – Garbage Collection mechanism is introduced – Decreased the memory consumption by factor 2 and more for long jobs (W.R.T. original G4AllocatorPool) Decreased the memory consumption by factor 5+ W.R.T. sequential simulation job 12

CNAF – INFN BOLOGNA (1500 cores) CIEMAT – MADRID (200 cores) – GENEVA (3000 cores) NLAA – BEIJING (1024 cores) SEU – NANJING (2016 cores) ACAD. SINICA – TAIPEI (3000 cores) AMS-02 Computing Resources IN2P3 – LYON (500 cores) RWTH – Juelich(4000 cores)

Light-weight production platform Fully-automated production cycle – Job acquiring, submission, monitoring, validation, transferring, and (optional) scratching 14 Update () Reset() ret>0 ret=0 ret>0 ret=0 ret>0 ret=- 1 Idle slots Empty pool Idle slots Non- empty pool Fully loaded Retrieve() Submit() Validate() Update() Reset() Scratch() Easy to deploy – Based on Perl/Python/sqlite3 Customizable – Batch system, storage, transferring, etc. Running at: – JUROPA, CNAF, IN2P3, NLAA

Latest Full Data Reconstruction Second half of Feb Performed at CERN, JUROPA, CNAF, ACAD. SINICA, and all provided us exclusive resources 40 months’ flight data Reconstruction completed essentially in two weeks 100 times data collection speed failing rate 15

Latest Full Data Reconstruction 16

Summary Parallelization of the AMS reconstruction software allows to increase the productivity of the reconstruction program up to 45% using the HT enable processors as well as to short per run reconstruction time and calibration jobs execution time by factor of 10. Parallelization of the AMS simulation software allows to decrease the total memory consumption up to factor 5, which makes it much easier to run long simulation jobs for light nuclei. With production supporting software, data reconstruction rate reached 3 data collection months per production day. 17

18