April 10, 2008, Garching Claudio Gheller CINECA The DEISA HPC Grid for Astrophysical Applications.

Slides:

Advertisements

Similar presentations

Forschungszentrum Jülich in der Helmholtz-Gesellschaft December 2006 A European Grid Middleware Achim Streit

Advertisements

Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.

MyGrid: A User-Centric Approach for Grid Computing Walfredo Cirne Universidade Federal da Paraíba.

Ingrid Conferene, Ischia, April Stefan Heinzel, DEISA DEISA Towards a European HPC Infrastructure ( Topics Vision The DEISA/eDEISA.

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini

The UNICORE GRID Project Karl Solchenbach Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D Brühl, Germany.

Problem-Solving Environments: The Next Level in Software Integration David W. Walker Cardiff University.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Lisbon, August A. Streit DEISA Forschungszentrum Jülich in der Helmholtz-Gesellschaft Achim Streit

Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.

1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.

CSC Grid Activities Arto Teräs HIP Research Seminar February 18th 2005.

Workload Management Massimo Sgaravatto INFN Padova.

W w w. h p c - e u r o p a. o r g Single Point of Access to Resources of HPC-Europa Krzysztof Kurowski, Jarek Nabrzyski, Ariel Oleksiak, Dawid Szejnfeld.

EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,

Assessment of Core Services provided to USLHC by OSG.

TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.

Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team

1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.

SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.

EPCC, University of Edinburgh DIRAC and SAFE. DIRAC requirements DIRAC serves a variety of different user communities. –These have different computational.

Experiences with using UNICORE in Production Grid Infrastructures DEISA and D-Grid Michael Rambadt

EuroCAMP, Malaga, October 19, 2006 DEISA requirements for federations and AA Jules Wolfrat SARA

GGF16 Athens, February DEISA Perspectives Towards cooperative extreme computing in Europe Victor Alessandrini IDRIS - CNRS

Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.

A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster

RI User Management in DEISA The DEISA VO view Jules Wolfrat SARA, HPDC’08 workshop June 24, 2008.

The John von Neumann Institute for Computing (NIC): A survey of its computer facilities and its Europe-wide computational science activities Norbert Attig.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Summary of distributed tools of potential use for JRA3 Dugan Witherick HPC Programmer for the Miracle Consortium University College.

Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.

1 Web: Steve Brewer: Web: EGI Science Gateways Initiative.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

RI User Support in DEISA/PRACE EEF meeting 2 November 2010, Geneva Jules Wolfrat/Axel Berg SARA.

Forschungszentrum Jülich in der Helmholtz-Gesellschaft Experiences with using UNICORE in Production Grid Infrastructures DEISA and D-Grid Michael Rambadt.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

(D)CI related activities at IFCA Marcos López-Caniego Instituto de Física de Cantabria (CSIC-UC) Astro VRC Workshop Paris Nov 7th 2011.

1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.

Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.

The Swiss Grid Initiative Context and Initiation Work by CSCS Peter Kunszt, CSCS.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.

RI The DEISA Sustainability Model Wolfgang Gentzsch DEISA-2 and OGF rzg.mpg.de.

Research Infrastructures Information Day Brussels, March 25, 2003 Victor Alessandrini IDRIS - CNRS.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

Panel “Making real large-scale grids for real money-making users: why, how and when?” August 2005 Achim Streit Forschungszentrum Jülich in der.

Identity Management in DEISA/PRACE Vincent RIBAILLIER, Federated Identity Workshop, CERN, June 9 th, 2011.

7. Grid Computing Systems and Resource Management

Fourth EGEE Conference Pise, October 23-28, 2005 DEISA Perspectives Towards cooperative extreme computing in Europe Victor Alessandrini IDRIS - CNRS

Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.

14, Chicago, IL, 2005 Science Gateways to DEISA Motivation, user requirements, and prototype example Thomas Soddemann, RZG, Germany.

E-Infrastructure the FP7 prospects Mário Campolargo European Commission - DG INFSO Head of Unit Research Infrastructures TERENA Networking Conference 2006.

CMB & LSS Virtual Research Community Marcos López-Caniego Enrique Martínez Isabel Campos Jesús Marco Instituto de Física de Cantabria (CSIC-UC) EGI Community.

EGEE Workshop on Management of Rights in Production Grids Paris, June 19th, 2006 Victor Alessandrini IDRIS - CNRS DEISA : status, strategies, perspectives.

Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.

Monterey HPDC Workshop Experiences with MC-GPFS in DEISA Andreas Schott

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

1 MSWG, Amsterdam, December 15, 2005 DEISA security Jules Wolfrat SARA.

A European Grid Technology Achim Streit Jülich Supercomputing Centre (JSC)

Bob Jones EGEE Technical Director

Workload Management Workpackage

Clouds , Grids and Clusters

Long-term Grid Sustainability

DEISA : integrating HPC infrastructures in Europe Prof

Grid Computing.

CRESCO Project: Salvatore Raia

EGI Webinar - Introduction -

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

April 10, 2008, Garching Claudio Gheller CINECA The DEISA HPC Grid for Astrophysical Applications

April 10, 2008, Garching Disclaimer My background: Computer science in astrophysics My involvement in DEISA: Support to scientific extreme computing projects (DECI) I’m not: A systems espert A networking expert

April 10, 2008, Garching Conclusions DEISA is not Grid computing It is (super) super computing

April 10, 2008, Garching The DEISA project: overview What is: DEISA (Distributed European Infrastructure for Super-computing Applications) is a consortium of leading national EU supercomputing centres Goals: deploy and operate a persistent, production quality, distributed supercomputing environment with continental scope. When: The Project is funded by European Commission: May April It has been re-funded (DEISA2): May 2008 – April 2010

April 10, 2008, Garching The DEISA project: drivers oSupport High Performance Computing. oIntegrate the Europe’s most powerful supercomputing systems. oEnable scientific discovery across a broad spectrum of science and technology. oBest exploitation of the resources both at site level and European level oPromote openness and usage of standards

April 10, 2008, Garching The DEISA project: what is NOT oDEISA is not a middleware development project. oDEISA, actually, is not a Grid: it does not support Grid computing. Rather it supports Cooperative Computing.

April 10, 2008, Garching BSC, Barcelona Supercomputing Centre, Spain CINECA, Consorzio Interuniversitario, Italy CSC, Finnish Information Technology Centre for Science, Finland EPCC/HPCx, University of Edinburgh and CCLRC, UK ECMWF, European Centre for Medium-Range Weather Forecast, UK FZJ, Research Centre Juelich, Germany HLRS, High Performance Computing Centre Stuttgart, Germany LRZ, Leibniz Rechenzentrum Munich, Germany RZG, Rechenzentrum Garching of the Max Planck Society, Germany IDRIS, Institut du Développement et des Resources en Informatique Scientifique – CNRS, France SARA, Dutch National High Performance Computing, Netherlands The DEISA project: core partners

April 10, 2008, Garching Three activity areas Networking: management, coordination and dissemination Service Activities: running the infrastructure Joint Research Activities: porting and running scientific applications on the DEISA infrastructure The DEISA project: Project Organization

April 10, 2008, Garching Deisa Activities, some (maybe too many…) details (1) Service Activities: Network Operation and Support. (FZJ leader). Deployment and operation of a gigabit per second network infrastructure for an European distributed supercomputing platform. Data Management with Global file systems. (RZG leader). Deployment and operation of global distributed file systems, as basic building blocks of the "inner" super-cluster, and as a way of implementing lobal data management in a heterogeneous Grid. Resource Management. (CINECA leader). Deployment and operation of global scheduling services for the European super cluster, as well as for its heterogeneous Grid extension. Applications and User Support. (IDRIS leader). Enabling the adoption by the scientific community of the distributed supercomputing infrastructure, as an efficient instrument for the production of leading computational science. Security. (SARA leader). Providing administration, authorization and authentication for a heterogeneous cluster of HPC systems, with special emphasis on single sign-on

April 10, 2008, Garching The DEISA Extreme Computing Initiative (DECI) See Deisa Activities, some (maybe too many…) details (2)

April 10, 2008, Garching JRA2: Cosmological Applications Goals: to avail the Virgo Consortium of the most advanced features of Grid computing by porting their production applications –GADGET and FLASH to make an effective use of the DEISA infrastructure to lay the foundations of a Theoretical Virtual Observatory Leaded by EPCC which works in close partnership with the Virgo Consortium –JRA2 managed jointly by Gavin Pringle (EPCC/DEISA) and Carlos Frenk (co-PI of both Virgo and VirtU) –work progressed after gathering clear user requirements from Virgo Consortium. –requirements and results published as public DEISA deliverables.

April 10, 2008, Garching Current DEISA status variety of systems connected via GEANT/GEANT2 (Premium IP) centres contribute 5% to 10% of CPU cycles to DEISA –running projects selected from the DEISA Extreme Computing Initiative (DECI) calls Premium IP is a service that offers network priority over other traffic on GÉANT. Premium IP traffic takes priority over all other services.

April 10, 2008, Garching DEISA HPC systems IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 SARA SGI ALTIX LRZ SGI ALTIX BSC IBM PPC CSC IBM P4 CINECA IBM P5

April 10, 2008, Garching DEISA technical hints: software stack UNICORE is the grid “glue” –not built on Globus –EPCC developing UNICORE command-line interface Other components –IBM’s General Parallel File System multiclusterGPFS can span different systems over a WAN recent developments for Linux as well as AIX –IBM’s Load Leveler for job scheduling Multicluster Load Leveler can re-route batch jobs to different machines also available on Linux

April 10, 2008, Garching DEISA model large parallel jobs running on a single supercomputer –network latency between machines not a significant issue jobs submitted – ideally - via UNICORE, in practice via Load Leveler –re-routed where appropriate to remote resources Single-Sign-On access via GSI-SSH GPFS absolutely crucial to this model –jobs have access to data no matter where they run –no source code changes required standard fread/fwrite(or READ/WRITE) calls to Unix files also have a Common Production Environment –defines a common set of environment variables –defined locally to map to appropriate resources Eg $DEISA_WORK will point to local workspace

April 10, 2008, Garching Running ideally on DEISA Fill all the gaps restart/continue jobs on any machine from file checkpoints –no need to recompile application program –no need to manually stage data multi-step jobs running on multiple machines easy access to data for post-processing after a run

April 10, 2008, Garching Running on DEISA: Load Leveler IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 LRZ SGI ALTIX CSC IBM P4 CINECA IBM P5 SARA SGI ALTIX BSC IBM PPC AIX LL-MC AIX LL AIX LL-MC Super-UX NQS II AIX LL LINUX LSF LINUX PBS Pro AIX LL-MC LINUX LL Job

April 10, 2008, Garching Running ideally on DEISA: Unicore IDRIS FZJ IBMRZGHLRSCINECA SARA AIX LL-MC AIX LL AIX LL-MC Super-UX NQS II LINUX LSF LINUX PBS Pro AIX LL-MC LINUX LL HPCX AIX LL LRZ CSC Gateway ECMWF Gateway FZJ Gateway IDRIS Gateway HLRS Gateway HPCX Gateway LRZ Gateway RZG Gateway SARA Gateway BSC Gateway CINECA Gateway CSC NJS FZJ IBM P4 IDB UUDB NJS IDRIS IBM P4 IDBUUDB NJS HLRS NEC SX8 IDB UUDB NJS HPCX IBM P5 IDB UUDB NJS LRZ SGI ALTIX IDB UUDB NJS RZG IBM P4 IDB UUDB NJS SARA SGI ALTIX IDB UUDB NJS BSC IBM PPC IDB UUDB NJS CINECA IBM P5 IDB UUDB NJS CSC IBM P4 IDB UUDB NJS ECMWF IBM P4 IDB UUDB ECMWF BSC

April 10, 2008, Garching GPFS Multicluster HPC systems mount /deisa/sitename users read/write directly from/to these file systems /deisa/idr /deisa/cne /deisa/rzg /deisa/fzj /deisa/csc

April 10, 2008, Garching DEISA Common Production Environment (DCPE) DCPE… what is it? both a set of software (the software stack ) and a generic interface to access the software (based on the Modules tool ) Required to both offer a common interface to the users and to hide the differences between local installations Essential feature for job migration inside homogeneous super-clusters The DCPE includes: shells (Bash and Tcsh), compilers (C, C++, Fortran and Java), libraries (for numerical analysis, data formatting, etc.), tools (debuggers, profilers, editors, development tools), applications.

April 10, 2008, Garching Modules Framework oModules tool chosen because it was well known by many sites and many users oPublic domain software oTcl implementation used Modules: ooffer a common interface different software components on different computers, oto hide different names and configurations oto manage individually each software and load only those required into the user environment, ofor each user to change the version of each software independently of the others, ofor each user to switch independently between the current default version of a software to another one (older or newer).

April 10, 2008, Garching The HPC users’ vision Initial vision: “Full” Distributed computing IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 SARA SGI ALTIX LRZ SGI ALTIX BSC IBM PPC CSC IBM P4 CINECA IBM P5 Task1 Task2 Task3

April 10, 2008, Garching The HPC users visions Initial vision: “Full” Distributed computing IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 SARA SGI ALTIX LRZ SGI ALTIX BSC IBM PPC CSC IBM P4 CINECA IBM P5 Task1 Task2 Task3 Impossible!!!!

April 10, 2008, Garching The HPC users vision Jump computing IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 SARA SGI ALTIX LRZ SGI ALTIX BSC IBM PPC CSC IBM P4 CINECA IBM P5 Task

April 10, 2008, Garching The HPC users vision Jump computing IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 SARA SGI ALTIX LRZ SGI ALTIX BSC IBM PPC CSC IBM P4 CINECA IBM P5 Task Difficult… HPC applications are… HPC applications!!! Fine tuned on the architectures

April 10, 2008, Garching So… what… Jump computing is useful to reduce queue waiting times. Find the gap… and fill it… can work, better on homogeneous systems IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 LRZ SGI ALTIX CSC IBM P4 CINECA IBM P5 SARA SGI ALTIX BSC IBM PPC AIX LL-MC AIX LL AIX LL-MC Super-UX NQS II AIX LL LINUX LSF LINUX PBS Pro AIX LL-MC LINUX LL Job

April 10, 2008, Garching So… what… Single image filesystem is a great solution!!!!! (even if moving data…) IDRIS IBM P4 ECMWF IBM P4 FZJ IBM P4 RZG IBM P4 HLRS NEC SX8 HPCX IBM P5 LRZ SGI ALTIX CSC IBM P4 CINECA IBM P5 SARA SGI ALTIX BSC IBM PPC AIX LL-MC AIX LL AIX LL-MC Super-UX NQS II AIX LL LINUX LSF LINUX PBS Pro AIX LL-MC LINUX LL DEISA GPFS SHARED FILESYSTEM

April 10, 2008, Garching So… what… Usual Grid solution requires to learn new stuff… Often scientists are not willing to… DEISA rely on Load Leveler (or other common scheduling systems)… same scripts, same commands you are used to!!! However, only IBM systems support LL… The Common Production Environment offers a shared (and friendly) set of tools to the users. However, compromises must be accepted…

April 10, 2008, Garching High latency Low latency Low integration High integration Internet GRID Distributed computing and data grids: EGEE Capacity cluster Capacity supercomputer Distributed supercomputing DEISA Capability supercomputer Enabling computing HPC centres Summing up… Growing up, DEISA is moving away from a Grid. In order to fulfill the needs of HPC users, it is trying to become a huge supercomputer. On the other hand, DEISA2 must lead to a service infrastructure and users’ expectations MUST be matched (no more time for experiments…)

April 10, 2008, Garching DECI: enabling Science to DEISA oIdentification, deployment and operation of a number of « flagship » applications requiring the infrastructure services, in selected areas of science and technology. o European Call for proposals in May - June every year. Applications are selected on the basis of scientific excellence, innovation potential and relevance criteria, with the collaboration of the HPC national evaluation committees. oDECI users are supported by the Applications Task Force (ATASKF), whose objective is to enable and deploy the Extreme Computing applications.

April 10, 2008, Garching LFI-SIM DECI Project (2006) Planck (useless) overview: Planck is the 3rd generation space mission for the mapping and the analysis of the microwave sky: its unprecedented combination of sky and frequency coverage, accuracy, stability and sensitivity is designed to achieve the most efficient detection of the Cosmic Microwave Background ( CMB ) in both temperature and polarisation. In order to achieve the ambitious goals of the mission, unanimously acknowledged by the scientific community to be of the highest importance, data processing of extreme accuracy is needed. Principal Investigator(s) Fabio Pasian (INAF- O.A.T.), Hannu Kurki-Suonio (Univ. of Helsinki) Leading Institution INAF -O.A Trieste and Univ. of Helsinki Partner Institution(s) o INAF-IASF Bologna, o Consejo Superior de Investigaciones Cientificas (Instituto de Fisica de Cantabria), o Max-Planck Institut für Astrophysik Garching, o SISSA Trieste, o University of Milano, o University “Tor Vergata” Rome DEISA Home Site CINECA

April 10, 2008, Garching Need of simulations in Planck NOT the typical DECI-HPC project !!! Simulations are used to: oassess likely science outcomes; oset requirements on instruments in order to achieve the expected scientific results; otest the performance of data analysis algorithms and infrastructure; ohelp understanding the instrument and its noise properties; oanalyze known and unforeseen systematic effects; odeal with known physics and new physics. Predicting the data is fundamental to understand them.

April 10, 2008, Garching Simulation pipeline Add foregrounds Generate CMB sky Add foregrounds “Observe” sky with LFI reference sky maps Time-Ordered Data cosmological parameters frequency sky maps cosmological parameters Add foregrounds Data reduction Freq. merge Comp. sep. component maps C(l) evaluation C(l) Parameter evaluation Knowledge and details increase over time, therefore the whole computational chain must be iterated many times instrument parameters NEED OF HUGE COMPUTATIONAL RESOURCES GRID can be a solution!!!

April 10, 2008, Garching Planck & DEISA DEISA was expected to be used to osimulate many times the whole mission of Planck’s LFI instrument, on the basis of different scientific and instrumental hypotheses; oreduce, calibrate and analyse the simulated data down to the production of the final products of the mission, in order to evaluate the impact of possible LFI instrumental effects on the quality of the scientific results, and consequently to refine appropriately the data processing algorithms. Model 1 Model 2 Model 3 Model N

April 10, 2008, Garching Outcomes oPlanck simulations are essential to get the best possible understanding of the mission and to have a “conscious expectation of the unexpected” oThey also allow to properly plan Data Processing Centre resources oThe usage of the EGEE grid resulted to be more suitable for such project since it provides fast access to small/medium computing resources. Most of the Planck pipeline is happy with such resources!!! oHowever DEISA was useful to produce massive sets of simulated data and to perform and test the data processing steps which requires large computing resources (lots of coupled processors, large memories, large bandwidth…) oInteroperation between the two grid infrastructures (possibly based on the G-Lite middleware) is expected in the next years