Presentation is loading. Please wait.

Presentation is loading. Please wait.

(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011.

Similar presentations


Presentation on theme: "(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011."— Presentation transcript:

1 (D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011

2 The Observational Cosmology and Instrumentation Group at the Instituto de Física de Cantabria (CSIC-UC) is involved in several aspects of the data analysis of Planck, ESA ’ s mission to study the Cosmic Microwave Background Radiation. In addition to Planck, we are involved in the analysis of data from other experiments such as WMAP and Herschel, and in the simulation and data analysis of a new CMB experiment called QUIJOTE. During the EGEE-III project we dedicated a fair amount of time and effort to port to the GRID several applications to do CMB-related analysis. – Detection of Point Sources in single frequency maps – Detection of Point Sources using multifrequency information – Detection of Clusters of Galaxies using the SZ effect – Detection of Non-Gaussian features in CMB maps Astro VRC Workshop Paris Nov 7th 20113

3 Our experience using the GRID when running these applications was very variable: – We used input maps up to 200-400 MBs each, and we had many of these! – To do some multifrequency analysis we had to load more than one map at the time and nodes with several GBs of RAM were not always available. – In particular, the original SZ cluster application required up to 9 all-sky maps, and the nodes did not have enough RAM memory to deal with them maps and with the intermediate files that produced. – To solve this we decided to divide the maps into 100’s of patches beforehand, group, gzip and put them in the SE before starting the job. – This strategy worked very well and we used the GRID to do the SZ analysis of Planck simulations for about two years. But also implied some additional pre-processing work. – Analogously we were able to adapt the NG codes to avoid continuous data traffic between the SE and the nodes, and it also worked very well. – Moving this amount of information from the Storage Elements to the nodes was always problematic and very ofter saturated the network. Astro VRC Workshop Paris Nov 7th 20114

4 – It was also common to have failed jobs because the codes started to run before the maps had fully arrived to the nodes, even though this should never happen due to the sequencial order of commands in the scripts. – Maybe the most important problem that we had was a high rate of failed/killed jobs. Sometimes the 100% of the jobs run smoothly and sometimes we had to resubmit “by hand” mostif not all the jobs because they had failed for unknown problems. – One way to improve the situation was to force the jobs to run in nodes physically close to the Storage Element and the rate of succesful jobs improved. – No tool to control failed jobs was available and resubmission by hand was not an option. I heard of things like “metaschedulers”, gridway, etc. Not available in our infrastructure. – In the near future our input maps can be as large as 2GB each -> more problems. Not everything was bad, there was a huge amount of resources available at a time when obtaining 10.000’s of CPU hours in HPC clusters was not easy and we did use the GRID a lot. But… with the launch of Planck in mid-2009 the amount of work increased to a level that we could not afford spending valuable time dealing with problems with the GRID. And we moved away from the GRID and started to work with normal clusters. Astro VRC Workshop Paris Nov 7th 20115

5 Current activities at IFCA (not distributed in the sense of GRID or Cloud): – the number of people in our group running jobs in HPC has doubled in the last couple of years. Now most of the people in the group (10-12/15) use clusters for their daily work. – We use big infrastructures across Spain, Europe and sometimes in the US Spain: GRID-CSIC, Altamira (part of the BSC at IFCA) Finland: CSC (member of PRACE) UK: Darwin, Cosmos and Universe in Cambridge Italy: Planck LFI cluster in Trieste Germany: a cluster in the Forschungzentrum in Julich US: NERSC, IPAC Estimation of the number of CPU time used in the last year per proyect: Astro VRC Workshop Paris Nov 7th 20116 Of the order of a Few Million CPU hours and 10’s of TB of storage

6 Component Separation Planck Last 12 monthsNext 6 months Total CPU hours2.00020.000 RAM/job [GB]33 Total Storage [GB]60500 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 20117

7 Component Separation Pol WMAP Last 12 months Total CPU hours100.000 RAM/job [GB]2 Total Storage [GB]40 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 20118

8 Anomalies in LSSWMAP Next 6 months Total CPU hours40.000 RAM/job [GB]3 Total Storage [GB]3 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 20119

9 Non-Gaussianity Analysis Planck fn_l wavelets Planck fn_l wavelets Last 12 monthsNext 6 months Total CPU hours600.000300.000 RAM/job [GB]55 Total Storage [GB]1000500 PARALLEL JobsNO InfrastructureGRID-CSIC Astro VRC Workshop Paris Nov 7th 201110

10 Non-Gaussianity Analysis Planck fn_l wavelets Planck fn_l wavelets Last 12 monthsNext 6 months Total CPU hours350.000175.000 RAM/job [GB]33 Total Storage [GB]1000500 PARALLEL JobsYES InfrastructureCSC Astro VRC Workshop Paris Nov 7th 201111

11 Non-Gaussianity Analysis WMAP neural networks WMAP neural networks Last 12 monthsNext 6 months Total CPU hours650.000325.000 RAM/job [GB]33 Total Storage [GB]500250 PARALLEL JobsYES InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 201112

12 Non-Gaussianity Analysis WMAP neural networks WMAP neural networks Last 12 monthsNext 6 months Total CPU hours600.000300.000 RAM/job [GB]33 Total Storage [GB]500250 PARALLEL JobsYES InfrastructureGRID-CSIC Astro VRC Workshop Paris Nov 7th 201113

13 Non-Gaussianity Analysis WMAP neural networks Last 12 months Total CPU hours600.000 RAM/job [GB]3 Total Storage [GB]500 PARALLEL JobsYES InfrastructureDarwin Astro VRC Workshop Paris Nov 7th 201114

14 Non-Gaussianity Analysis WMAP Hamiltonian Samp. WMAP Hamiltonian Samp. Last 12 monthsNext 6 months Total CPU hours10.0005.000 RAM/job [GB]33 Total Storage [GB]1000500 PARALLEL JobsYES InfrastructureCosmos/Universe Astro VRC Workshop Paris Nov 7th 201115

15 SZ Cluster Detection Planck Last 12 monthsNext 6 months Total CPU hours80.00040.000 RAM/job [GB]22 Total Storage [GB]5025 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 201116

16 SZ Cluster Detection Planck Last 12 monthsNext 6 months Total CPU hours10.0005.000 RAM/job [GB]22 Total Storage [GB]2010 PARALLEL JobsNO InfrastructureTrieste Astro VRC Workshop Paris Nov 7th 201117

17 PS DetectionPlanck Last 12 monthsNext 6 months Total CPU hours5.0002.500 RAM/job [GB]22 Total Storage [GB]22 PARALLEL JobsNO InfrastructureTrieste Astro VRC Workshop Paris Nov 7th 201118

18 Bayesian PS Detection in Pol Planck/QUIJOTE Last 12 monthsNext 6 months Total CPU hours4.0002.000 RAM/job [GB]22 Total Storage [GB]22 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 201119

19 LSS cluster simulations Last 6 monthsNext 6 months Total CPU hours500.000 RAM/job [GB]44 Total Storage [GB]45.000 PARALLEL JobsYES InfrastructureJulich-DE Astro VRC Workshop Paris Nov 7th 201120

20 CPU hoursRAMStorageCPU hoursRAMStorage Component Separation 100.000310020.0005500 SZ Cluster detection 90.00027045.000235 PS detection T and P 5.000222.500235 PS Detection Bayesian 5.000222.50022 Non Gaussianity 2.800.00034.5001.100.00032.000 Large Scale Structure simulations 500.000445.000540.0003-445.000 Total 3.500.0002-4 GB50 TB1.710.0002-5 GB48TB Last 12 monthsNext 6 months Astro VRC Workshop Paris Nov 7th 201121

21 The activities that we carry out at IFCA demand huge amount of computational resources. We were involved in EGEE-III because we really thought that the GRID was the solution to our computing problems, but evolves too slow outside the High Energy Physics community. After a small learning period we were in the condition to submit jobs to the GRID, and we did, but the success rate was too variable. We backed-up the proposal for a big Grid Infrastructure in Spain and we use it, but in HPC mode. At the same time we started to have access to HPC clusters in Spain and across Europe where jobs run smoothly and our hopes for a usable GRID died. Maybe because of the kind of jobs that we run, it never transitioned from experimental to production and our continuous Planck data analysis could not wait. Maybe the situation has evolved in the last 12-18 months and the GRID in A&A is now more mature. In our group at IFCA we are always in need of additional computing resources (we recenlty bought our own small cluster with the particularity that single jobs can access up to 256 GB RAM). In addition to Planck and WMAP, we are involved in other experiments that will require large amounts of computing resources (QUIJOTE and the PAU-Javalambre Astrophysical Survey), and will be happy if we could count on the GRID or Cloud Computing if it is really in a production stage. Astro VRC Workshop Paris Nov 7th 201122 CONCLUSIONS


Download ppt "(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011."

Similar presentations


Ads by Google