ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing1 Organization of the Euclid Data Processing: dealing with complexity Fabio Pasian (INAF – O.A.Trieste) and Christophe Dabin, Marc Sauvage, Oriana Mansutti, Claudio Vuerli, Anna Gregorio on behalf of the Euclid SGS development team The presented document is Proprietary information of the Euclid Consortium. This document shall be used and disclosed by the receiving Party and its related entities (e.g. contractors and subcontractors) only for the purposes of fulfilling the receiving Party's responsibilities under the Euclid Project and that identified and marked technical data shall not be disclosed or retransferred to any other entity without prior written permission of the document preparer.
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing2 The Euclid Mission M2 mission in the framework of the ESA Cosmic Vision Programme Euclid mission objective is to map the geometry and understand the nature of the dark Universe (dark energy and dark matter) Actors in the mission: ESA and the Euclid Consortium (institutes from 13 European countries and USA, funded by their own national Space Agencies) For more information see :
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing3 The Euclid Consortium The Euclid Consortium is in charge of: –building and operating the instruments (VIS and NISP) –developing and running the data processing within a unified Science Ground Segment (SGS) –performing the science analysis on the Euclid data products The Euclid Consortium is composed of members –350+ Consortium members participating in SGS (active: ~150)
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing4 Euclid at a Glance
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing5 SOC ESAC MOC ESOC scientific community EA DDS Ground Station Euclid External data (KiDS, DES,...) SDC VObs EA is built jointly by EC and SOC, and is managed by SOC. «Internal» and «public» EA functions – the latter allows access to a subset of EA data The Ground Segment at a glance SDC ECSGS Project Office SDC ESA/SOC and the EC SGS have developed, and are committed to maintain, a tight collaboration in order to design and develop a single, truly integrated SGS. This is an institutional view of the GS System Team
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing6 LE1 SOC MOC The Ground Segment as seen in the high-level Euclid documents Euclid Consortium (ECSGS) MOGSSGS Ground Station
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing7 Level 1 Level 2 Level 3 Level E Level S VIS NIRSIREXT LE1 MER SIM SPESHEPHZ LE3 VIS/NIR/SIR/EXT cross-check VIS/NIR/SIR/EXT cross-check SIR cross-check SIR cross-check MER cross-check MER cross-check OPS MOC Ground Station The Ground Segment as seen from the data processing point of view The coloured boxes correspond to the Processing Functions, which are a product of the Euclid SGS SOC MOGSSGS This is an functional view of the SGS
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing8 SWGs, OUs and SDCs Science Working Groups –external to the SGS –turning science objectives into requirements placed on the pipeline products and performances –verifying that the requirements are met (define V&V procedures) Organisation Units –providing the algorithmic definition of the processing to be implemented by the SDCs and validate the implementation Science Data Centres –implementing the data processing pipelines as specified by OUs –procuring local h/w and s/w resources –different activities: SDC-DEV (development – i.e. transforming algorithms into robust code) SDC-PROD (integration on local infrastructure, production runs of pipeline) individual Euclid scientists may belong to more than one of the above groups
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing9 Development–Verification&Validation SWGsOU SDC-DEV SDC-PROD … requirements validation (on results) code validation algorithms, test data pipeline code, test data pipelines verification 2. only for validation against high-level requirements for every Processing Function 1. in most cases, no interfaces but joint development 3. common integration platform
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing10 Development–Verification&Validation 1.Set of documents being prepared jointly between OUs and SDCs (by product – Processing Function – and not by organisation) : a.PF Requirements Specification Document b.PF Validation Plan c.Development Plans (organised by SDC) 2.Validation by SWGs of the high-level data processing requirements a.high-level data processing requirements attributed to PFs b.the SGS will be considered as validated if every high-level data processing requirement is validated c.the SGS is including in the top-level IV&V plans the inputs provided by the SWG coordinators regarding the principles of validation as well as the recommendations and typologies of Validation test – this top-level document will be co-signed by SGS and SWG coordinators Responding to recommendations from the SGS-PRR: –Simplification/reduction of interfaces –«Best Practices» document issued to help OUs/SDCs
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing11
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing12 Pillars of SGS development SGS-Level Services = shared tools and systems for SGS software development Standards and guidelines Development platform Integration platform Data model Software infrastructure The System Team provides these to make the integration and operation of the Processing Functions a simple as possible [ Current status is wrt ADASS XXIII ]
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing13 Standards and guidelines Standards and guidelines help developers take the right decisions Show how/where to improve code to meet the demanding requirements of the Euclid data processing Encourage the use of best practices Provide tools to help developers improve their code Current status: Standards being developed based on previous project experience and adapted to the Euclid context
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing14 Development and integration platform The SGS uses a single development platform specifying Operating system, Programming language, Support libraries CODEEN is the Euclid collaborative development and continuous integration platform The cost of fixing bugs increases as the system integration approaches completion Usage mandatory for main processing software Current status: Python adopted as the second language allowed for pipeline development in addition to C++ ( Linux + C++ & Python ) Drivers: More flexibility about who can contribute to development, long term direction of astronomical programming The System Team will ensure that we get all of the benefits and avoid the (known!) pitfalls
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing15 Data model Explicit data model built by OUs to describe the output of their processing functions (therefore input to other Processing Functions in most cases) Many projects have an implicit data model, using conventions and shared code data structures Change management of implicit data models is difficult, particularly for long-living projects where knowledge can be lost Current status: Data Model Workshops held with great participation from OUs and System Team First iterations of the DM very promising – real data products starting to be defined Challenge now is to increase the coverage to all products and maintain a flexible process to allow the DM to evolve in a controlled way along with the Processing Functions. CCB started, to accept new items and to evaluate change requests
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing16 Software Infrastructure Three main systems Data Management System (EAS) - Shared set of tools for managing the Euclid dataset: data discovery and exchange, data processing support, quality and lineage tracking Abstraction Layer (IAL) – Processing management at the SDC computing facilities Processing Orchestration (COORS) – Coordinating processing activities across all the SDCs Current status: Prototypes exist for the core EAS system and IAL Integration of these systems though EC SGS Challenges demonstrate real progress towards a working data processing system
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing17 ST Challenge #3 Final goal of challenges : deploying transparently pipelines on all SDCs Technical objectives : Demonstrate the capability to deploy IAL VM images into SDCs Demonstrate the capability to deploy, in the context of each SDC, the TIPS, NIP and VIS simulators as Euclid pipeline objects Demonstrate the capability of IAL, in the context of each SDC, to : fetch, on the basis of the metadata provided by EAS prototype (in SDC-NL), the pipelines input data in the local SDC storage area launch simulators jobs across clusters (when available in SDCs) or dedicated nodes, in accordance with PPOs defined remotely (through Jenkins) or locally (by each SDC leader) – orchestration mock-up produce and store output data into the local SDC storage area send the appropriate metadata to EAS prototype in SDC-NL Schedule: Baseline availability for deployment into SDCs : end of December 2013 By mid-February 2014, all SDCs had successfully fulfilled the challenge
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing18 Thank you for your attention
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing19
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing20 Acknowledgments Thanks to ESA and to the Euclid Consortium, and in particular: ESA: John Hoar (ESAC), Guillermo Buenadicha (ESAC), René Laureijs (ESTEC), Giuseppe Racca (ESTEC), Pedro Osuna (ESAC), Bruno Altieri (ESAC), Michael Schmidt (ESOC), Cyril Colombo (ESTEC), Ralf Kohley (ESAC),... EC: Yannick Mellier (IAP), Andrea Zacchei (INAF), Keith Noddle (UoE), Maurice Poncet (CNES), Rees Williams (RuG), Christian Neissner (PIC), Johannes Koppenhöfer (MPG), Pierre Dubath (Unige), Elina Keihänen (UHelsinki), Marco Frailis (INAF), Jean-Marc Delouis (IAP), Jean-Jacques Metge (CNES), Christian Surace (LAM), Nikos Apostolakos (Geneva), Laurent Vibert (IAS), Martin Melchior (FHNW), Stefan Müller (FHNW), Marco Soldati (FHNW), Andrey Belikov (RuG), Edwin Valentijn (RuG), Harry Teplitz (IPAC), OUs staff, SDCs staff, … The SGS PRR Panel and Board And many other people involved in the project This is a REAL team effort ... and thank you for your attention
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing21 Organisation Group Project Office ECSGS Management Config. Lead O. Mansutti PA/QA Lead C. Vuerli IOT Coordination A. Gregorio Proj.Ctr. Support D. Fierro OUs System Team L. C. Dabin OU-NIR A. Grazian R. Bouwens OU-VIS H. Mc Cracken N. Shane OU-SIR M. Scodeggio C. Surace OU-EXT J. Mohr G.Verdoes-Kleijn OU-MER A. Fontana M.Kuemmel,M.Douspis OU-SIM S. Serrano A. Ealet OU-SHE A. Taylor F.Courbin,T.Schrabback OU-SPE O. Le Fèvre M. Mignoli OU-LE3 J-L. Starck F.Abdalla, E.Branchini OU-PHZ S. Paltani F. Castander SDCs SDC-DE J. Koppenhoefer F. Raison SDC-CH P. Dubath SDC-FI E. Keihanen H. Kurki-Suonio SDC-ES C. Neissner N. Tonello SDC-IT A. Zacchei M. Frailis SDC-FR M. Poncet J-J. Metge SDC-UK K. Noddle M. Holliman SDC-NL O. R. Williams A. Belikov SDC-US J. Rector H. Teplitz Abstraction Layer (IAL) M. Melchior Architecture Performance K. Noddle Data Modeling C. Dabin Monitoring & Control L. Vibert Archive Data A. Belikov Archive Metadata P. Osuna Common Tools M. Poncet Orchestration K. Noddle Data Quality M. Brescia ECSGS Manager F. Pasian ECSGS Scientist M. Sauvage ECSGS Deputy C. Dabin LE1 common infrastructure M.Frailis
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing22 Processing Functions –are a product of the Euclid SGS (to be eventually delivered to ESA at the end of the mission) –correspond to the processing steps which are performed within an «Euclid pipeline» –are algorithmically devised by the relevant OU and engineered by software development teams (SDC-DEV) –can in principle be run yielding the same results on any SDC site of the SGS (SDC-PROD, different HW environments) In most cases, Processing Functions are developed jointly by OU members and their local SDC-DEV teams –formal OU-SDC interfaces not needed in most cases –easier to develop directly pipeline-quality code –SGS System Team provides tools/standards/support (SDC Leads are members of the ST)
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing23 SWGsOU SDC-DEV SDC-PROD … requirements validation (on results) code validation algorithms, test data pipeline code, test data pipelines verification
ADASS XXIV, Calgary, 5-9 Oct 2014Fabio Pasian – Euclid Data processing24 OUs are transnational An organization based on the decomposition in Organization Units (OU), corresponding to a subset of overall EUCLID Data Processing. OU-VISOU-NIROU-SIROU-EXTOU-MEROU-PHZOU-SPEOU-LE3OU-SHE SOC ESAC MOC ESOC scientific community EA DDS Ground Station Euclid External data (KiDS, DES,...) SDC OU-SIM VIS ImagNir ImagNir SpectroExt DataEuclidisationSpectro MeasLevel 3Morpho & ShearPhot Red Sh. Simulation SDC OU coordinator OU Deputy Coordinator VObs EA is built jointly by EC and SOC, and is managed by SOC. «Internal» and «public» EA functions – the latter allows access to a subset of EA data The Ground Segment at a glance SDC EC-SGS Project Office SDC ESA/SOC and the EC SGS have developed, and are committed to maintain, a tight collaboration in order to design and develop a single, truly integrated SGS.