(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011.

Slides:



Advertisements
Similar presentations
Challenges for Interactive Grids a point of view from Int.Eu.Grid project Remote Instrumentation Services in Grid Environment RISGE BoF Manchester 8th.
Advertisements

SDN + Storage.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Foreground cleaning in CMB experiments Carlo Baccigalupi, SISSA, Trieste.
GRID Activities at ESAC Science Archives and Computer Engineering Unit Science Operations Department ESA/ESAC – Madrid, Spain.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
Bill Reach 2009 May 14 Greater IPAC Technology Symposium.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
1 Computer and Network Bottlenecks Author: Rodger Burgess 27th October 2008 © Copyright reserved.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Block1 Wrapping Your Nugget Around Distributed Processing.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
E-science grid facility for Europe and Latin America E2GRIS1 Gustavo Miranda Teixeira Ricardo Silva Campos Laboratório de Fisiologia Computacional.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
The Planck Satellite Hannu Kurki-Suonio University of Helsinki Finnish-Japanese Workshop on Particle Cosmology, Helsinki
Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Planck Report on the status of the mission Carlo Baccigalupi, SISSA.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
US Planck Data Analysis Review 1 Julian BorrillUS Planck Data Analysis Review 9–10 May 2006 Computing Facilities & Capabilities Julian Borrill Computational.
EC Review – 01/03/2002 – WP9 – Earth Observation Applications – n° 1 WP9 Earth Observation Applications 1st Annual Review Report to the EU ESA, KNMI, IPSL,
Cosmic Microwave Background Data Analysis At NERSC Julian Borrill with Christopher Cantalupo Theodore Kisner.
EGEE is a project funded by the European Union under contract IST Security Monitoring Miguel Cárdenas Montes Security Contact SWE Federation.
OPTIMIZATION OF DIESEL INJECTION USING GRID COMPUTING Miguel Caballer Universidad Politécnica de Valencia.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
The Planck Satellite Matthew Trimble 10/1/12. Useful Physics Observing at a redshift = looking at light from a very distant object that was emitted a.
Experiences in running High Throughput Computing on the Cloud Stephen McGough Vassilis Glenis, Chris Kilsby, Vedrana Kutija, Simon Wodman.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Spanish National Research Council- CSIC Isabel.
EGI-InSPIRE RI EGI Community Forum 2012 EGI-InSPIRE EGI-InSPIRE RI EGI Community Forum 2012 Kepler Workflow Manager.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
CMB & LSS Virtual Research Community Marcos López-Caniego Enrique Martínez Isabel Campos Jesús Marco Instituto de Física de Cantabria (CSIC-UC) EGI Community.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Map-making leads to understanding When we understand the evolution from one map to another, we can understand  the sociological, economic, and political.
Euclid Big Data Opportunities Tom Kitching (UCL MSSL) – Euclid Science Lead.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Constraints on primordial non-Gaussianity.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
Canadian Bioinformatics Workshops
Accessing the VI-SEEM infrastructure
Status and requirements of PLANCK
BD-Cache: Big Data Caching for Datacenters
Distributed Network Traffic Feature Extraction for a Real-time IDS
Work report Xianghu Zhao Nov 11, 2014.
Computing Facilities & Capabilities
US CMS Testbed.
Integration of Singularity With Makeflow
Cloud Migration What to Consider When Switching Providers NAME: SYED TARIQ SHAH “WAQIF” REG NO: K1S18MCS0021 SUB: CLUSTER AND CLOUD COMPUTING.
Grid Canada Testbed using HEP applications
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
湖南大学-信息科学与工程学院-计算机与科学系
Support for ”interactive batch”
Overview of Workflows: Why Use Them?
H2020 EU PROJECT | Topic SC1-DTH | GA:
Presentation transcript:

(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011

The Observational Cosmology and Instrumentation Group at the Instituto de Física de Cantabria (CSIC-UC) is involved in several aspects of the data analysis of Planck, ESA ’ s mission to study the Cosmic Microwave Background Radiation. In addition to Planck, we are involved in the analysis of data from other experiments such as WMAP and Herschel, and in the simulation and data analysis of a new CMB experiment called QUIJOTE. During the EGEE-III project we dedicated a fair amount of time and effort to port to the GRID several applications to do CMB-related analysis. – Detection of Point Sources in single frequency maps – Detection of Point Sources using multifrequency information – Detection of Clusters of Galaxies using the SZ effect – Detection of Non-Gaussian features in CMB maps Astro VRC Workshop Paris Nov 7th 20113

Our experience using the GRID when running these applications was very variable: – We used input maps up to MBs each, and we had many of these! – To do some multifrequency analysis we had to load more than one map at the time and nodes with several GBs of RAM were not always available. – In particular, the original SZ cluster application required up to 9 all-sky maps, and the nodes did not have enough RAM memory to deal with them maps and with the intermediate files that produced. – To solve this we decided to divide the maps into 100’s of patches beforehand, group, gzip and put them in the SE before starting the job. – This strategy worked very well and we used the GRID to do the SZ analysis of Planck simulations for about two years. But also implied some additional pre-processing work. – Analogously we were able to adapt the NG codes to avoid continuous data traffic between the SE and the nodes, and it also worked very well. – Moving this amount of information from the Storage Elements to the nodes was always problematic and very ofter saturated the network. Astro VRC Workshop Paris Nov 7th 20114

– It was also common to have failed jobs because the codes started to run before the maps had fully arrived to the nodes, even though this should never happen due to the sequencial order of commands in the scripts. – Maybe the most important problem that we had was a high rate of failed/killed jobs. Sometimes the 100% of the jobs run smoothly and sometimes we had to resubmit “by hand” mostif not all the jobs because they had failed for unknown problems. – One way to improve the situation was to force the jobs to run in nodes physically close to the Storage Element and the rate of succesful jobs improved. – No tool to control failed jobs was available and resubmission by hand was not an option. I heard of things like “metaschedulers”, gridway, etc. Not available in our infrastructure. – In the near future our input maps can be as large as 2GB each -> more problems. Not everything was bad, there was a huge amount of resources available at a time when obtaining ’s of CPU hours in HPC clusters was not easy and we did use the GRID a lot. But… with the launch of Planck in mid-2009 the amount of work increased to a level that we could not afford spending valuable time dealing with problems with the GRID. And we moved away from the GRID and started to work with normal clusters. Astro VRC Workshop Paris Nov 7th 20115

Current activities at IFCA (not distributed in the sense of GRID or Cloud): – the number of people in our group running jobs in HPC has doubled in the last couple of years. Now most of the people in the group (10-12/15) use clusters for their daily work. – We use big infrastructures across Spain, Europe and sometimes in the US Spain: GRID-CSIC, Altamira (part of the BSC at IFCA) Finland: CSC (member of PRACE) UK: Darwin, Cosmos and Universe in Cambridge Italy: Planck LFI cluster in Trieste Germany: a cluster in the Forschungzentrum in Julich US: NERSC, IPAC Estimation of the number of CPU time used in the last year per proyect: Astro VRC Workshop Paris Nov 7th Of the order of a Few Million CPU hours and 10’s of TB of storage

Component Separation Planck Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]33 Total Storage [GB]60500 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 20117

Component Separation Pol WMAP Last 12 months Total CPU hours RAM/job [GB]2 Total Storage [GB]40 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 20118

Anomalies in LSSWMAP Next 6 months Total CPU hours RAM/job [GB]3 Total Storage [GB]3 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th 20119

Non-Gaussianity Analysis Planck fn_l wavelets Planck fn_l wavelets Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]55 Total Storage [GB] PARALLEL JobsNO InfrastructureGRID-CSIC Astro VRC Workshop Paris Nov 7th

Non-Gaussianity Analysis Planck fn_l wavelets Planck fn_l wavelets Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]33 Total Storage [GB] PARALLEL JobsYES InfrastructureCSC Astro VRC Workshop Paris Nov 7th

Non-Gaussianity Analysis WMAP neural networks WMAP neural networks Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]33 Total Storage [GB] PARALLEL JobsYES InfrastructureAltamira Astro VRC Workshop Paris Nov 7th

Non-Gaussianity Analysis WMAP neural networks WMAP neural networks Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]33 Total Storage [GB] PARALLEL JobsYES InfrastructureGRID-CSIC Astro VRC Workshop Paris Nov 7th

Non-Gaussianity Analysis WMAP neural networks Last 12 months Total CPU hours RAM/job [GB]3 Total Storage [GB]500 PARALLEL JobsYES InfrastructureDarwin Astro VRC Workshop Paris Nov 7th

Non-Gaussianity Analysis WMAP Hamiltonian Samp. WMAP Hamiltonian Samp. Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]33 Total Storage [GB] PARALLEL JobsYES InfrastructureCosmos/Universe Astro VRC Workshop Paris Nov 7th

SZ Cluster Detection Planck Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]22 Total Storage [GB]5025 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th

SZ Cluster Detection Planck Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]22 Total Storage [GB]2010 PARALLEL JobsNO InfrastructureTrieste Astro VRC Workshop Paris Nov 7th

PS DetectionPlanck Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]22 Total Storage [GB]22 PARALLEL JobsNO InfrastructureTrieste Astro VRC Workshop Paris Nov 7th

Bayesian PS Detection in Pol Planck/QUIJOTE Last 12 monthsNext 6 months Total CPU hours RAM/job [GB]22 Total Storage [GB]22 PARALLEL JobsNO InfrastructureAltamira Astro VRC Workshop Paris Nov 7th

LSS cluster simulations Last 6 monthsNext 6 months Total CPU hours RAM/job [GB]44 Total Storage [GB] PARALLEL JobsYES InfrastructureJulich-DE Astro VRC Workshop Paris Nov 7th

CPU hoursRAMStorageCPU hoursRAMStorage Component Separation SZ Cluster detection PS detection T and P PS Detection Bayesian Non Gaussianity Large Scale Structure simulations Total GB50 TB GB48TB Last 12 monthsNext 6 months Astro VRC Workshop Paris Nov 7th

The activities that we carry out at IFCA demand huge amount of computational resources. We were involved in EGEE-III because we really thought that the GRID was the solution to our computing problems, but evolves too slow outside the High Energy Physics community. After a small learning period we were in the condition to submit jobs to the GRID, and we did, but the success rate was too variable. We backed-up the proposal for a big Grid Infrastructure in Spain and we use it, but in HPC mode. At the same time we started to have access to HPC clusters in Spain and across Europe where jobs run smoothly and our hopes for a usable GRID died. Maybe because of the kind of jobs that we run, it never transitioned from experimental to production and our continuous Planck data analysis could not wait. Maybe the situation has evolved in the last months and the GRID in A&A is now more mature. In our group at IFCA we are always in need of additional computing resources (we recenlty bought our own small cluster with the particularity that single jobs can access up to 256 GB RAM). In addition to Planck and WMAP, we are involved in other experiments that will require large amounts of computing resources (QUIJOTE and the PAU-Javalambre Astrophysical Survey), and will be happy if we could count on the GRID or Cloud Computing if it is really in a production stage. Astro VRC Workshop Paris Nov 7th CONCLUSIONS