Euclid UK on IRIS Mark Holliman SDC-UK Deputy Lead

Slides:



Advertisements
Similar presentations
GENIUS kick-off - November 2013 GENIUS kick-off meeting The Gaia context: DPAC & CU9 X. Luri.
Advertisements

XSEDE 13 July 24, Galaxy Team: PSC Team:
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
30 March 2006Birmingham workshop1 The Gaia Mission A stereoscopic census of our Galaxy.
Objectives Learn what a file system does
ELISA Data Processing Centre Volker Beckmann APC, Francois Arago Centre A. Petiteau, E. Porter, G. Auger, E. Plagnol, P. Binétruy.
Data processing group. General study of data processing architecture: - overall definition of the data processing functions - share of tasks between on-board.
OSG Public Storage and iRODS
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
SPACE TELESCOPE SCIENCE INSTITUTE Operated for NASA by AURA COS Pipeline Language(s) We plan to develop CALCOS using Python and C Another programming language?
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Data Management Subsystem Jeff Valenti (STScI). DMS Context PRDS - Project Reference Database PPS - Proposal and Planning OSS - Operations Scripts FOS.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
Consortium Meeting La Palma October PV-Phase & Calibration Plans Sarah Leeks 1 SPIRE Consortium Meeting La Palma, Oct. 1 – PV Phase and.
1 System wide optimization for dark energy science: DESC-LSST collaborations Tony Tyson LSST Dark Energy Science Collaboration meeting June 12-13, 2012.
A PPARC funded project Astronomical services: situated software vs. commodity software Guy Rixon, AstroGrid/AVO/IVOA Building Service Based Grids - GGF11.
Research Networks and Astronomy Richard Schilizzi Joint Institute for VLBI in Europe
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
EC Review – 01/03/2002 – WP9 – Earth Observation Applications – n° 1 WP9 Earth Observation Applications 1st Annual Review Report to the EU ESA, KNMI, IPSL,
DiRAC-3 – The future Jeremy Yates, STFC DiRAC HPC Facility.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
High-throughput parallel pipelined data processing system for remote Earth sensing big data in the clouds Высокопроизводительная параллельно-конвейерная.
Euclid Big Data Opportunities Tom Kitching (UCL MSSL) – Euclid Science Lead.
Astronomy toolkits and data structures Andrew Jenkins Durham University.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
GLAST LAT ProjectNovember 18, 2004 I&T Two Tower IRR 1 GLAST Large Area Telescope: Integration and Test Two Tower Integration Readiness Review SVAC Elliott.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
High Performance Computing (HPC)
Federating Data in the ALICE Experiment
Compute and Storage For the Farm at Jlab
NASA Earth Science Data Stewardship
The CLoud Infrastructure for Microbial Bioinformatics
AENEAS WP6 first conference call
What is HPC? High Performance Computing (HPC)
WP18, High-speed data recording Krzysztof Wrona, European XFEL
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
ATLAS Cloud Operations
The INES Archive in the era of Virtual Observatories
Conception of parallel algorithms
JWST Pipeline Overview
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
SGS for OUSIM Meeting goals:
Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. --
The LHC Computing Grid Visit of Her Royal Highness
UK Status and Plans Scientific Computing Forum 27th Oct 2017
NGS Oracle Service.
Exploitation of ISS Scientific data - sustainability
Computing Infrastructure for DAQ, DM and SC
WLCG Collaboration Workshop;
Main Memory Management
Haiyan Meng and Douglas Thain
By Peter Kettig Supervisor: Antoine Basset, CNES
LHC Data Analysis using a worldwide computing grid
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Process Description and Control
MMG: from proof-of-concept to production services at scale
Backfilling the Grid with Containerized BOINC in the ATLAS computing
Monitoring the INTEGRAL satellite
H2020 EU PROJECT | Topic SC1-DTH | GA:
Presentation transcript:

Euclid UK on IRIS Mark Holliman SDC-UK Deputy Lead Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh 03/04/2019 IRIS F2F Meeting, Apr 2019

Euclid Summary ESA Medium-Class Mission In the Cosmic Visions Programme M2 slot (M1 Solar Orbiter, M3 PLATO) Due for launch December 2021 Largest astronomical consortium in history: 15 countries, ~2000 scientists, ~200 institutes. Mission data processing and hosting will be spread across 9 Science Data Centres in Europe and US (each with different levels of commitment). Scientific Objectives To understand the origins of the Universe’s accelerated expansion Using at least 2 independent complementary probes (5 probes total) Geometry of the universe: Weak Lensing (WL) Galaxy Clustering (GC) Cosmic history of structure formation: WL, Redshift Space Distortion (RSD), Clusters of Galaxies (CL) Euclid on IRIS in 2018: First use of Slurm as a service (Saas) deployment on IRIS resources at Cambridge on the not-quite-decommissioned Darwin cluster with support from Cambridge HPC and Stack HPC teams Used for Weak Lensing pipeline sensitivity test simulations ~15k cores for 8 weeks from August-October Generated ~200TB of simulations data on the Lustre shared filesystem during that time (along with ~2TB of data on a smaller Ceph system) User logins through SSH, jobs run through Slurm DRM head node, workload managed by the Euclid IAL software and pipeline executables distributed through CVMFS (as per mission requirements). Successful example of Saas, and now we’d like to see the Saas tested/operated on a federated basis 03/04/2019 IRIS F2F Meeting, Apr 2019

Euclid UK on IRIS in 2019 Three main areas of work: Shear Lensing Simulations: Large scale galaxy image generation and analysis simulations (similar to the simulations run on IRIS at Cambridge in 2018, though larger in scope). Mission Data Processing scale out: Proof of concept using IRIS Slurm as a Service (Saas) infrastructure to scale out workloads from the dedicated cluster in Edinburgh to resources at other IRIS providers. Science Working Group Simulations (led by Tom Kitching): Science Performance Verification (SPV) simulations for setting the science requirements of the mission, and used to assess the impact of uncertainty in the system caused by any particular pipeline function or instrument calibration algorithm. IRIS 2019 Allocations (all on OpenStack, preferably using Saas) SCD: 750 cores DiRAC: 1550 cores GridPP: 800 cores 03/04/2019 IRIS F2F Meeting, Apr 2019

Shear Lensing Simulations Description of work: The success of Euclid to measure Dark Energy to the required accuracy relies on the ability to extract a tiny Weak Gravitational Lensing signal (a 1% distortion of galaxy shapes) to high-precision (an error on the measured distortion of 0.001%) and high accuracy (residual biases in the measured distortion below 0.001%), per galaxy (on average), from a large data-set (1.5 billion galaxy images). Both errors and biases are picked up at every step of the analysis. We are tasked with testing the galaxy model-fitting algorithm the UK has developed to extract this signal and to demonstrate it will work on the Euclid data, ahead of ESA Review and code-freezing for Launch. Working environment Nodes/containers running CentOS7 CVMFS access to euclid.in2p3.fr and euclid-dev.in2p3.fr repositories 2GB RAM per core, cores can be virtual Shared filesystem visible to the head node and workers. Preferably with high sequential IO rates (Ceph, Lustre, etc) Slurm DRM Preferably the Euclid job submission service (called the IAL) needs to be running on/near the head node for overall workflow management Use of IRIS Resources Saas cluster running at Cambridge on OpenStack (1550 cores) Ceph Shared FS Saas cluster running at RAL on OpenStack (750 cores) Shared filesystem TBD – options include using Ceph ansible to make a converged file system across the workers, or creating a single NFS host with a reasonable amount of block storage to mount on the workers. Tests are in progress. 03/04/2019 IRIS F2F Meeting, Apr 2019

Mission Data Processing Description of work: Euclid Mission Data Processing will be performed at 9 Science Data Centres (SDCs) across Europe and the US. Each SDC will process a subsection of the sky using the full Euclid pipeline, and resulting data will be hosted at archive nodes at each SDC throughout the mission. The expected processing workflow will follow these steps: Telescope data is transferred from satellite to ESAC. Level 1 processing occurs locally (removal of instrumentation effects, insertion of metadata). Level 1 data is copied to target SDCs (targets are determined by national Mission Level Agreements, and distribution is handled automatically by the archive). Associated external data (DES, LSST, etc) is distributed to the target SDCs (this could occur prior to arrival of Level 1 data). Level 2 data is generated at each SDC by running the full Euclid pipeline end-to-end on the local data (both mission and external). Results are stored in a local archive node, and registered with the central archive. Necessary data products are moved to Level 3 SDCs for processing of Level 3 data (LE3 data is expected to require specialized infrastructure in many cases, such as GPUs or large shared memory machines). Data Release products are declared by the Science Ground Segment Operators and then copied back to the Science Operations Centre (SOC) at ESAC for official release to the community. Older release data, and or intermediate data products, are moved from disk to tape on a regular schedule. The data movement is handled by the archive nodes, using local storage services. 03/04/2019 IRIS F2F Meeting, Apr 2019

SDC-UK Architecture 03/04/2019 IRIS F2F Meeting, Apr 2019

Mission Data Processing con’t Working Environment Nodes/containers running CentOS7 CVMFS access to euclid.in2p3.fr and euclid-dev.in2p3.fr repositories 8GB RAM per core, cores can be virtual Shared filesystem visible to the head node and workers. Preferably with high sequential IO rates (Ceph, Lustre, etc) Torque/SGE/Slurm DRM Euclid job submission service (called the IAL) needs to be running on/near the head node for overall workflow management Use of IRIS Resources The goal is to have a working Saas solution tied into our existing infrastructure by 2020 for the final phase of Euclid end-to-end pipeline tests. Preliminary tests will occur in 2019, based around a Slurm head node running on the Edinburgh OpenStack instance, with workers running on the Manchester OpenStack instance (and likely others). This setup will require use of VPN type networking to connect the workers to the head node. These tests will include use of a multi-site converged CephFS for shared storage. Expected resource utilization is only 100 cores for 2019. 03/04/2019 IRIS F2F Meeting, Apr 2019

Science Working Group Description of work: The Science Working Groups (SWG) are the name of the entities in Euclid that before launch set the science requirements for the algorithm developers and also the instrument teams, and assess the expected scientific performance of the mission. The UK leads the weak lensing SWG (WLSWG; Kitching), which in 2019 is tasked with Science Performance Verification (SPV) for testing the Euclid weak lensing pipeline. The current version of SPV simulations take a catalogue of galaxies from an n-body simulation and creates a transfer-function like chain of effects whereby the properties of each galaxy are modified by astrophysical, telescope, detector, and measurement process effects. These modified galaxy catalogues are then used to assess the performance of the mission as well as the impact of uncertainty in the system caused by pipeline algorithms or instrument calibration algorithms. Working Environment: Nodes/containers running CentOS7 Python2 (with additional scientific libraries) 6GB RAM per core, cores can be virtual Shared filesystem (for data workflow management) Use of IRIS Resources 700 Cores on Manchester OpenStack Shared FS will need to be sorted, with the lead possibility being the use of a converged Ceph system (the same system tested for the Mission Data Processing work). A local NFS node with sufficient block storage could be used as a fallback. Saas head node will likely run on the Edinburgh OpenStack, connected to workers through VPN 03/04/2019 IRIS F2F Meeting, Apr 2019

Euclid UK on IRIS 2020… Shear pipeline sensitivity testing – The UK developed pipeline algorithms need to be tested at scale pre-launch (Dec 2021) to ensure accuracy and performance, and then yearly simulations will need to be run post-launch for generating and refining calibration data. Pre-launch simulations are estimated to require ~3k core years per year, and 200TB of intermediate (shared) file storage. Post-launch simulation are estimated to require ~1.5k core years per year, and 100TB of intermediate (shared) file storage. Mission Data Processing – Euclid UK will have a dedicated cluster for day-to-day data processing over the course of the mission, but it is expected that this will be insufficient at peak periods (i.e. official data releases, data reprocessing). During these periods we need to scale out onto other resource providers, ideally through an Saas system (or something equivalent). It is also likely that we would want to use a shared tape storage system for archiving older Euclid data. Estimates currently anticipate ~1.5k core years and 100TB of intermediate (shared) file storage from 2022-2024. Data tape estimates are currently very fuzzy while we await the mission MLA’s to be finalized, but they should be on the order of growing 100TB a year per year from 2020 onwards Science Working Group – The weak lensing SWG anticipates a similar need for simulations in 2020 as in 2019 (1.5k core years). But then the SWG expects to ramp up significantly on launch year and beyond, running a suite of simulations for verifying mission science requirements and data analysis 2021 - 4k core years, 100TB of intermediate (shared) file storage 2022 – 5k core years 200TB of intermediate (shared) file storage 2023 – 5k core years 200TB of intermediate (shared) file storage 03/04/2019 IRIS F2F Meeting, Apr 2019