The CTA Computing Grid Project Cecile Barbier, Nukri Komin, Sabine Elles, Giovanni Lamanna, LAPP, CNRS/IN2P3 Annecy-le-Vieux Cecile Barbier - Nukri Komin1EGI User Forum Vilnius,
Cecile Barbier - Nukri KominEGI User Forum Vilnius, CTA-CG CTA Computing Grid LAPP, Annecy Giovanni Lamanna, Nukri Komin Cecile Barbier, Sabine Elles LUPM, Montpellier Georges Vasileiadis Claudia Lavallay, Luisa Arrabito Goal: Bring CTA on the Grid
Cecile Barbier - Nukri KominEGI User Forum Vilnius, CTA-CG Aim provide working environment, tools and services for all tasks assigned to Data Management and Processing Center simulation data processing storage offline analysis user's interface Test Grid computing software around Grid computing estimate computing needs and requirements requests at Lyon, close contact with DESY Zeuthen
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Outline CTA and its Data Management and Processing Centre current activities: massive Monte Carlo Simulations preparation of Meta Data Base short-term plan bring the user on the Grid and to the data ideas for future data management and analysis pipe-line Note: CTA is in preparatory phase here mostly work in progress and ideas
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Current Cherenkov Telescopes
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Cherenkov Telescope Array large array, 30 – 100 telescopes in 3 sizes Preparatory Phase institutes, 22 countries
The CTA Observatory main logical units : Science Operation Centre: organisation of observations Array Operation Centre: the on-site service Science Data Centre: Software development Data analysis Data reduction Data archiving Data dissemination to observers Total expected data volume from CTA: 1 to 10 (?) PB per year (main data stream for permanent storage is of the order of 1 (10 ?) GB/s) MC requirements: tens of CPU years, hundreds of TB Existing ICT-based infrastructures, such as EGEE/EGI and GEANT, are potential solutions to provide the CTA observatory with best use of e-infrastructures. 7 CTA Operational Data Flow
Cecile Barbier - Nukri KominEGI User Forum Vilnius, CTA Virtual Grid Organisation Benefits of the EGEE/EGI Grid institutes can provide easily computing power minimal man power needed, usually sites already supporting LHC can be managed centrally (e.g. for massive simulations) distributed but transparent for all users (compare HESS) CTA Virtual Organisation: vo.cta.in2p3.fr French name, but open to everyone (renaming almost impossible) VO manager: G. LAPP
Cecile Barbier - Nukri KominEGI User Forum Vilnius, CTA VO – computing 14 sites in 5 countries providing access to their computing resources 3 big sites: CC Lyon, DESY Zeuthen, Cyfronet Poland GRIF: several sites of various sizes in/around Paris many small sites (~100 CPUs) 30k logical CPUs, shared with other VOs ~1000 – 2000 CPUs for CTA at any time (based on experience)
Cecile Barbier - Nukri KominEGI User Forum Vilnius, CTA VO – computing load not smooth, only one simulation manager (Nukri Komin) LAPP, CCIN2P3 and DESY Zeuthen among the biggest contributors
Cecile Barbier - Nukri KominEGI User Forum Vilnius, CTA VO – storage each site providing several 100GB up to 10 TB local disk space massive storage (several 100 TB): CC Lyon (including tapes), DESY Zeuthen, Cyfronet massive storage for large temporary files simulations: corsika, will be kept for reprocessing corsika file size GB for proton showers
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Grid Monte Carlo Production first massive use of Grid: MC simulations About good quality runs high requirements per run that only few grid sites can handle up to 4 GB RAM 10 GB local scratch disk space many problems solved, next round much more efficient with automated MC simulation production using the EasiJob tool developed at LAPP
Cecile Barbier - Nukri KominEGI User Forum Vilnius, web interface monitoring Create the job: config files and scripts GANGA job and grid control out: browse results in: configure and start a task CTA VO Grid Operation Centre Grid tools and software developed for CTACG Interface with the community Configuration Monitoring Access to data files central data base files on grid SEs Automated Simulation Production
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Automated Simulation Production EasiJob – Easy Integrated Job Submission developed by S. Elles within the MUST frame work MUST = Mid-Range Data Storage and Computing Centre widely open to Grid Infrastructure, at LAPP Annecy and Savoie University more general than CTA can be used for any software and every experiment based on GANGA (Gaudi/Athena aNd Grid Alliance) Grid front-end in python developed for Atlas and LHCb, used by many other experiments task configuration, job submission and monitoring, file bookkeeping
Cecile Barbier - Nukri KominEGI User Forum Vilnius, EasiJob – Task Configuration description of a task set of parameters, with default values, define if browsable representation in data base parameter keyword (#key1) job template set of files (input sandbox) keywords will be replaced by data base values web interface example: corsika
Cecile Barbier - Nukri KominEGI User Forum Vilnius, EasiJob – Job Classes configure site classes requirements are based on published parameters e.g. GLueHostMainMemoryRAMSize > 2000 these requirements are interpreted differently at each site sites have different storage capacities creation of job classes large jobs only on a subset of sites job/site matching currently semi-manual close interaction with local admins (in particular Lyon and Zeuthen) web interface
Cecile Barbier - Nukri KominEGI User Forum Vilnius, EasiJob – Job Submission define number of jobs for a task automated job submission to the site with the minimum of waiting jobs submission is paused when too many jobs are pending status monitoring and re-submission of failed jobs keeps track of produced files logical file name (LFN) on the Grid echo statement in execution script monitoring on web page
Cecile Barbier - Nukri KominEGI User Forum Vilnius, EasiJob – Status deployed in Annecy, will be used for next simulations configuration and job submission not open to public want to avoid massive non-sense productions user certificates need to be installed manually idea: provide “software as a service”
Cecile Barbier - Nukri KominEGI User Forum Vilnius, web interface monitoring Create the job: config files and scripts GANGA job and grid control out: browse results in: configure and start a task CTA VO Grid Operation Centre Grid tools and software developed for CTACG Interface with the community Configuration Monitoring Access to data files central data base files on grid SEs Bookkeeping
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Bookkeeping current simulation: Not all output files kept Production parameters set in EasiJob data base automatically generated web interface [C. Barbier] shows only parameters defined as browsable proposes only values which were produced returns list of lfn (logical file names) starting point for more powerful meta data base
future: complicated data structure we want to keep track of files produced and their relations search for files using the production parameters find information on files, even if the files have been removed Cecile Barbier - Nukri KominEGI User Forum Vilnius, Bookkeeping production file 1... production file 2... DST file... raw data file calibrated file DST real data structure
data: simulations, real data,... meta data: information describing the data logical and physical file name, production parameters, etc. meta information can be in several data bases Cecile Barbier - Nukri KominEGI User Forum Vilnius, Meta Data Management [C. Lavalley, LUP Montpellier]
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Meta Data Management AMI – Atlas Meta Data Interface developed at LPSC Grenoble can interrogate other data bases information can be pushed with AMI clients web, python, C++, Java clients manages access rights: username/password, certificate,... we will deploy AMI for CTA (with LUPM and LPSC) for simulations bookkeeping and file search to be tested for future use in CTA
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Bring the User to the Data You have a certificate? You can submit jobs to the EGI grid : glite-wms-job-submit or use Ganga ( easy to use python front end Grid User Interface needed certificate infrastructure software to download files and submit jobs we are evaluating a way to make Grid UI available Dirac :Distributed Infrastructure with Remote Agent Control
initially developed for LHCb, now generic version very easy to install Grid (and beyond) front-end work load management with pilot jobs : pull mode, no jobs lost due to Grid problems, shorter waiting time before execution integrated Data Management System integrated software management python and web interfaces for job submission LAPP, LUPM and Pic-IFAE Barcelona for setup/testing, will be open to collaboration soon we don't plan to use it for simulations Cecile Barbier - Nukri KominEGI User Forum Vilnius, DIRAC
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Analysis Chain telescopes raw data (level 0) raw data (level 0) calibrated camera images (level 1) calibrated camera images (level 1) photon list (level 2) photon list (level 2) sky maps, lightcurves, spectra (levels 3 and 4) sky maps, lightcurves, spectra (levels 3 and 4) e.g. Fermi Science Tools software available for Linux, Mac, Windows internal data published data
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Data Rates Raw Data: some GB/s, PB/year (level 0) production during night time, max h per day 29 day cycle with peak at new moon Reconstructed Data (level 2, available to public) about 10% of raw data computing requirements: 1h of raw data needs ~200 CPUs*days (today) based on HESS Model++, 28min of 4 telescopes needs 3x 1Ms [M. de Naurois] Results (level 3 and 4, available to public) requirements: to be evaluated
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Data Flow (a possible “Tier” view …) telescopes on-site or nearby computing centre computer cluster at participating institutes local machines, scientist's desktop two or three powerful computing centres Tier remote site: local computing centre or fast internet link
Data Source and Reconstruction Tier 0: local data source (at or near observatory) – make data available on local storage element – possible site for archiving Tier 1: calibration and reconstruction sites – at least 2 sites for redundancy – guaranteed CPU time for calibration and reconstruction – can handle peaks when other site down or re-calibration – requirements: disk space and computing power strong network between Tier 0 and 1: – most computation on Tier 1 weak network connection: – data reduction at Tier 0 ( = Tier 1 is at site)
Data Analysis Tier 2: Science Data Centre(s) – (small) computing clusters at participating institutes – data quality check – 1 st analysis – provide preprocessed data and results Tier 3: scientist's computer – provide individual computing and software – also for non-CTA scientists – data access (data download from nearest Tier 2) – simple installation on all systems → possibly virtual machine
Cecile Barbier - Nukri KominEGI User Forum Vilnius, Summary CTA Computing Grid, several 1000 CPUs at 14 sites currently used for massive simulations simulations tools and services for easy submission and monitoring (LAPP) will set up a meta data base for easy search and use (with LUPM) soon: tests of DIRAC for user analysis (with PIC) future Data Management and Processing Centre on distributed sites (Tier 0,1, (2,3)) Disclaimer: CTA data management system still under study (nothing yet decided !) CTA Computing Grid is one approach under study