Simulations and Data Reduction of the ESA Planck mission

Simulations and Data Reduction of the ESA Planck mission
(Questionnaire 4) C. Vuerli (1,2), G. Taffoni (1,2) A. Zacchei (1), F. Pasian (1,2) (1) INAF – Astronomical Observatory of Trieste (2) INAF – Informative Systems Unit First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description of the Application and Scientific goals
Outline Description of the Application and Scientific goals Results got until now Application requirements CPU data software security other First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description/The Planck Mission
Measure cosmic microwave background succeeds COBE, Boomerang & WMAP missions aims at even higher resolution Timeline launch August 2007 start of observations 2008 duration >1 year (14 months) Characteristics continuous data stream (TOD) large datasets changing calibration (parameters configuration) high-performance computing for data analysis First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description/COBE & Planck
First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description/Brief introduction
Goal: make possible N simulations of the whole Planck/LFI mission (~14 months), each time with different cosmological and instrumental parameters Full sky maps production for frequencies GHz by means of two complete sky surveys Sensitivity of a few μK per pixel 0,3° in amplitude 22 channels for LFI, 48 for HFI Data volume produced at the end of the mission: ~2 TB for LFI and ~15 TB for HFI Computing requirements: ~100 Tflops for raw data reduction, foregrounds extraction and CMB maps creation First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description/The Level-S
Purpose of the Level-S: ground checks (pre-launch phases); DPCs Pipelines tuning; Control check and correction (operational phase) Pipeline chained but not parallel (43 executables and a few libraries) Languages used are C/C++/Fortran/F90; Shell/Perl for scripts; Integration with the Process Coordinator (ProC) Integration (if DB support requested) with the G-DSE Typical application that benefits by distributed computing techniques Porting of Monte Carlo simulation code by Sam Leach Planck simulation is a set of 70 instances of the Pipeline (22 for LFI and 48 for HFI) First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description/The Level-S
CMB Power Spectrum [cmbfast] Data analysis CMB maps CMB Map [synfast] TOD Foregrounds and Beam Patterns Instrumental Noise Mission simulation Scanning Strategy First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Description/Distribution

Results/First Tests on a WS…
First tests performed on a workstation aimed at identifying the computational and storage needs of the simulation SW in detail LFI 30GHz LFI 44GHz LFI 70GHz short long 12 m 389 m 13 m 623 m 17 m 834 m 0.9 GB 34.2 GB 1.2 GB 45.3 GB 1.7 GB 75 GB Total 5.5 h 31 GB 255.7 h 1.3 TB Computational time on a dual CPU 2.4 GHz workstation with 2 GB of RAM for the whole simulation of the LFI mission [4 radiometers at 30 GHz, 6 at 44 GHz and 12 at 70 GHz] First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Results/… and on the Grid
User Node maps maps param param CE CE Node 1 Node k WN ... WN maps, TOD maps, TOD Workstation Grid Gain short 330 m 25 m 13 long 15342 m 955 m 16 Dual CPU WS 2,4 GHz with 2 GB di RAM vs. Grid First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Results/Scalability on the Grid

Trigrid Usage LFI DPC (at INAF Trieste) is setting up its own computing farm. An effort however is in progress to port on Grid the whole pipeline software, so that Grid infrastructures (EGEE, Trigrid, etc.) could, in case, be used The plan, therefore, is to use Trigrid both during the Simulations phase (pre-launch) and during operations During Simulations (HFI + LFI) Pipelines may be submitted by any user of the Planck VO During operations (LFI only), official pipelines will be submitted by LFI DPC staff only During operations it is foreseen to have ~250 jobs/month with approximately 8 jobs running at the same time First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

CPU Requirements Tests carried out so far show that we achieved the best results when running the whole MSJ (mission simulation job) with 22 available nodes, or in any case with 22 available (free) WNs (~960 min. / ~16 hours) In this case each SCP (single channel pipeline) of LFI is assigned to a different Grid node or WN We used WNs equipped with 2 CPUs, 2.5 GHz in speed (but 1 CPU only is used, no multi-threading) We will build our JDLs, therefore, in order to Ask a different Grid node for each SCP, OR Ask a dedicated free WN for each SCP Computing Requirements: ~100 Tflops for raw data reduction, foregrounds extraction and CMB maps creation First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Data Storage Requirements
RAM RAM requirements vary from job to job: min. 512 MB, max. 4 GB (map making for the 70 GHz frequency channel), typical 1 GB Disk space During simulations phase each job could require ~10 MB with peaks of 1 PB/job during operations It could be desirable to have enough space on Trigrid to keep data produced by pipeline runs for a TBD period of time Data left on Trigrid must be protected and accessible by LFI DPC staff only When, according to the agreed policy, the disk space will be freed on Trigrid, data (both final products as well as intermediate ones, e.g. single channel rings) will be moved from Trigrid to LFI DPC Estimated volume of data produced at the end of the Planck mission: ~2 TB for LFI (~15 TB for HFI) First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Software Requirements
Do we make use of commercial software? No Do we make use of third part software? CFITSIO. It will be modified to make jobs able to write intermediate results directly on SEs to overcome the shortage of disk space on WNs How do we intend to deploy the Pipeline software? Dynamic (on the fly). We move software rather than data. Pipeline runs will be prepared by moving the Pipeline software modules to the target WNs. The size of software modules is typically ~3 MB The criterium “CE nearest to the SE” is of crucial importance for us Size of the most significant Planck Data Structures: TODs are typically split in 1-hour chunks each of them ~1.6 GB in size Max Size of a channel Ring (70 GHz, sky and load signals kept separated)  9000 samples x 4 bytes = bytes. Total size for the whole mission and for all the 22 LFI channels ( bytes x min. x 22 channels) = ~479 GB. Rings bring scientific information only. TODs bring both scientific and H/K data. Typical size of a map is a few MB First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Security Aspects We need to protect both software and data
The IDIS system has been set up as a joint HFI-LFI collaboration to federate software, data and documentation and to grant access to them on a user account basis Data are property of the P.I. of the project for a period of time of at least two years We must protect the intellectual property of the software A map between the IDIS and the Grid authentication/authorization mechanisms must be set up To make things simpler, however, we can make possible access to software and data to the LFI DPC staff only, especially during the operative phase of the Planck mission First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

More Features of the Application
The application is sequential (batch oriented) ~20 developers are contributing to it Within Planck VO we have enough expertise to handle software data and security aspects on Grid. No technical support from Trigrid is requested Data sharing (within the Planck VO) and data exchange (with HFI Consortium) will be strictly controlled by the LFI DPC staff The application runs on any flavour of Linux OS F90 is the compiler typically used, but C and C++ code is even possible During operations it could be desirable the support of a database (integration with G-DSE) for our application although not strictly necessary. Produced data can be moved to the Planck DB during post-processing phase First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Thank you for your attention
End of Presentation End of Presentation Thank you for your attention First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Added values/CPU and Data
CPU power: E-computing lab; Production burst; Efficient CPU usage/sharing. Data storing/sharing: Distributed data for distributed users; Replica and security; Common interface to software and data. Planck simulations are highly computing demanding and produce a huge amount of data. Such resources cannot be usually afforded by a single research institute, both in terms of computing power and data storage space. First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Added values/Quality Native collaboration tool;
Common Interface to the users; Flexible environment; New approach to data and S/W sharing; Collaborative work for simulations and reduction: less time, less space, less frustration…. VObs view: Sharing data over a shared environment; Native authentication/authorization mechanism; A federation of users within a VO fosters the scientific collaboration. First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

ProC and G-DSE ProC is a scientific workflow engine developed in the framework of IDIS (Integrated Data and Information System) collaboration It executes “pipelines” of modules Workflows, directed “acyclic” graphs It allows the assembly of Pipelines using building block modules Modules may be heterogeneous (FORTRAN, C, C++, Java, Python, GDL/IDL, ...). Modules may also be sub-pipelines It is a data-driven, forward-chaining system It has components for ... graphical editing of workflow layouts checking for consistency & completeness Execution The G-DSE makes of databases a new embedded resource of the Grid. It enables a new Grid QE to access databases and use them for data analysis First Workshop TRIGRID VL – Catania – Monday March 13th, 2006 – Questionnaire

Simulations and Data Reduction of the ESA Planck mission

Similar presentations

Presentation on theme: "Simulations and Data Reduction of the ESA Planck mission"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Simulations and Data Reduction of the ESA Planck mission

Similar presentations

Presentation on theme: "Simulations and Data Reduction of the ESA Planck mission"— Presentation transcript:

Similar presentations

About project

Feedback