The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

Slides:



Advertisements
Similar presentations
DAS 3 and StarPlane have Landed Architecture, Status... Freek Dijkstra.
Advertisements

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Experiences.
The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam COMMIT/
Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences DAS-1 DAS-2 DAS-3.
Opening Workshop DAS-2 (Distributed ASCI Supercomputer 2) Project vrije Universiteit.
CCGrid2013 Panel on Clouds Henri Bal Vrije Universiteit Amsterdam.
The DutchGrid Platform Collaboration of projects from –Computer Science, HEP and service providers Participating and supported projects –Virtual Laboratory.
Parallel Programming on Computational Grids. Outline Grids Application-level tools for grids Parallel programming on grids Case study: Ibis.
The road to reliable, autonomous distributed systems
Distributed supercomputing on DAS, GridLab, and Grid’5000 Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.
Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.
Henri Bal Vrije Universiteit Amsterdam vrije Universiteit.
The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal, Jason Maassen, Rob van Nieuwpoort, Thilo Kielmann, Niels Drost, Ceriel Jacobs, Frank.
Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.
Grid Adventures on DAS, GridLab and Grid'5000 Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Ibis: a Java-centric Programming Environment for Computational Grids Henri Bal Vrije Universiteit Amsterdam vrije Universiteit.
Parallel Programming on Computational Grids. Outline Grids Application-level tools for grids Parallel programming on grids Case study: Ibis.
May TERENA workshopStarPlane StarPlane: Application Specific Management of Photonic Networks Paola Grosso SNE group - UvA.
The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Going Dutch: How to Share a Dedicated Distributed Infrastructure for Computer Science Research Henri Bal Vrije Universiteit Amsterdam.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
INFSO-SSA International Collaboration to Extend and Advance Grid Education ICEAGE Forum Meeting at EGEE Conference, Geneva Malcolm Atkinson & David.
DAS 1-4: 14 years of experience with the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam.
This work was carried out in the context of the Virtual Laboratory for e-Science project. This project is supported by a BSIK grant from the Dutch Ministry.
Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
E-science in the Netherlands Maria Heijne TU Delft Library Director / Chair Consortium of University Libraries and National Library.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Dutch Tier Hardware Farm size –now: 150 dual nodes + scavenging 200 nodes –buildup to ~1500 up-to-date nodes in 2007 Network –now: 2 Gbit/s internatl.
Using the VL-E Proof of Concept Environment Connecting Users to the e-Science Infrastructure David Groep, NIKHEF.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
The DutchGrid Platform – An Overview – 1 DutchGrid today and tomorrow David Groep, NIKHEF The DutchGrid Platform Large-scale Distributed Computing.
A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
The Scaling and Validation Programme PoC David Groep & vle-pfour-team VL-e Workshop NIKHEF SARA LogicaCMG IBM.
Authors: Ronnie Julio Cole David
ICT infrastructure for Science: e-Science developments Henri Bal Vrije Universiteit Amsterdam.
Key prototype applications Grid Computing Grid computing is increasingly perceived as the main enabling technology for facilitating multi-institutional.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.
SCARIe: using StarPlane and DAS-3 Paola Grosso Damien Marchel Cees de Laat SNE group - UvA.
DutchGrid KNMI KUN Delft Leiden VU ASTRON WCW Utrecht Telin Amsterdam Many organizations in the Netherlands are very active in Grid usage and development,
High level programming for the Grid Gosia Wrzesinska Dept. of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.
AMOEBA study of distributed system
Fault tolerance, malleability and migration for divide-and-conquer applications on the Grid Gosia Wrzesińska, Rob V. van Nieuwpoort, Jason Maassen, Henri.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Clouds , Grids and Clusters
StarPlane: Application Specific Management of Photonic Networks
(Parallel and) Distributed Systems Group
Henri Bal Vrije Universiteit Amsterdam
Vrije Universiteit Amsterdam
Cluster Computers.
Presentation transcript:

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences

Why is DAS interesting? Long history and continuity -DAS-1 (1997), DAS-2 (2002), DAS-3 (2006) Simple Computer Science grid that works -Over 200 users, 25 Ph.D. theses -Stimulated new lines of CS research -Used in international experiments Colorful future: DAS-3 is going optical

Outline History -Organization (ASCI), funding -Design & implementation of DAS-1 and DAS-2 Impact of DAS on computer science research in The Netherlands -Trend: cluster computing  distributed computing  Grids  Virtual laboratories Future: DAS-3

Step 1: get organized Research schools (Dutch product from 1990s) -Stimulate top research & collaboration -Organize Ph.D. education ASCI: -Advanced School for Computing and Imaging (1995-) -About 100 staff and 100 Ph.D. students from TU Delft, Vrije Universiteit, Amsterdam, Leiden, Utrecht, TU Eindhoven, TU Twente, … DAS proposals written by ASCI committees -Chaired by Tanenbaum (DAS-1), Bal (DAS-2, DAS-3)

Step 2: get (long-term) funding Motivation: CS needs its own infrastructure for -Systems research and experimentation -Distributed experiments -Doing many small, interactive experiments Need distributed experimental system, rather than centralized production supercomputer

DAS funding 2005~400NWO&NCFDAS NWODAS NWODAS-1 Approval#CPUsFunding NWO =Dutch national science foundation NCF=National Computer Facilities (part of NWO)

Step 3: (fight about) design Goals of DAS systems: -Ease collaboration within ASCI -Ease software exchange -Ease systems management -Ease experimentation  Want a clean, laboratory-like system Keep DAS simple and homogeneous -Same OS, local network, CPU type everywhere -Single (replicated) user account file

Behind the screens …. Source: Tanenbaum (ASCI’97 conference)

DAS-1 ( ) VU (128)Amsterdam (24) Leiden (24)Delft (24) 6 Mb/s ATM Configuration 200 MHz Pentium Pro Myrinet interconnect BSDI => Redhat Linux

DAS-2 (2002-now) VU (72)Amsterdam (32) Leiden (32) Delft (32) SURFnet 1 Gb/s Utrecht (32) Configuration two 1 GHz Pentium-3s >= 1 GB memory GB disk Myrinet interconnect Redhat Enterprise Linux Globus 3.2 PBS => Sun Grid Engine

Discussion Goal of the workshop: -Explain “what made possible the miracle that such a complex technical, institutional, human and financial organization works in the long-term” DAS approach -Avoid the complexity (don’t count on miracles) -Have something simple and useful -Designed for experimental computer science, not a production system

System management System administration -Coordinated from a central site (VU) -Avoid having remote humans in the loop Simple security model -Not an enclosed system Optimized for fast job-startups, not for maximizing utilization

Outline History -Organization (ASCI), funding -Design & implementation of DAS-1 and DAS-2 Impact of DAS on computer science research in The Netherlands -Trend: cluster computing  distributed computing  Grids  Virtual laboratories Future: DAS-3

DAS accelerated research trend Cluster computing Distributed computing Grids and P2P Virtual laboratories

Examples cluster computing Communication protocols for Myrinet Parallel languages (Orca, Spar) Parallel applications -PILE: Parallel image processing -HIRLAM: Weather forecasting -Solving Awari (3500-year old game) GRAPE: N-body simulation hardware

Distributed supercomputing on DAS Parallel processing on multiple clusters Study non-trivially parallel applications Exploit hierarchical structure for locality optimizations -latency hiding, message combining, etc. Successful for many applications

Example projects Albatross -Optimize algorithms for wide area execution MagPIe: -MPI collective communication for WANs Manta: distributed supercomputing in Java Dynamite: MPI checkpointing & migration ProActive (INRIA) Co-allocation/scheduling in multi-clusters Ensflow -Stochastic ocean flow model

Experiments on wide-area DAS-2

Grid & P2P computing Use DAS as part of a larger heterogeneous grid Ibis: Java-centric grid computing Satin: divide-and-conquer on grids KOALA: co-allocation of grid resources Globule: P2P system with adaptive replication I-SHARE: resource sharing for multimedia data CrossGrid: interactive simulation and visualization of a biomedical system Performance study Internet transport protocols

The Ibis system Programming support for distributed supercomputing on heterogeneous grids -Fast RMI, group communication, object replication, d&c Use Java-centric approach + JVM technology -Inherently more portable than native compilation -Requires entire system to be written in pure Java -Use byte code rewriting (e.g. fast serialization) -Optimized special-case solutions with native code (e.g. native Myrinet library)

International experiments Running parallel Java applications with Ibis on very heterogeneous grids Evaluate portability claims, scalability

Testbed sites

Experiences Grid testbeds are difficult to obtain Poor support for co-allocation Firewall problems everywhere Java indeed runs anywhere Divide-and-conquer parallelism can obtain high efficiencies (66-81%) on a grid -See Kees van Reeuwijk’s talk - Wednesday (5.45pm)

Management of comm. & computing Management of comm. & computing Management of comm. & computing Potential Generic part Potential Generic part Potential Generic part Application Specific Part Application Specific Part Application Specific Part Virtual Laboratory Application oriented services Grid Harness multi-domain distributed resources Virtual Laboratories

The VL-e project ( ) VL-e: Virtual Laboratory for e-Science 20 partners -Academia: Amsterdam, VU, TU Delft, CWI, NIKHEF,.. -Industry: Philips, IBM, Unilever, CMG, M€ (20 M€ from Dutch goverment) 2 experimental environments: -Proof of Concept: applications research -Rapid Prototyping (using DAS): computer science

Optical Networking High-performance distributed computing Security & Generic AAA Virtual lab. & System integration Interactive PSE Collaborative information Management Adaptive information disclosure User Interfaces & Virtual reality based visualization Bio-diversity Bio-Informatics Telescience Data Intensive Science Food Informatics Medical diagnosis & imaging Virtual Laboratory for e-Science

Visualization on the Grid

DAS - 3 (2006) Partners: -ASCI, Gigaport-NG/SURFnet, VL-e, MultimediaN More heterogeneity Experiment with (nightly) production use DWDM backplane -Dedicated optical group of lambdas -Can allocate multiple 10 Gbit/s lambdas between sites

DAS-3DAS-3 CPU’s R R R R R NOC

StarPlane project Key idea: -Applications can dynamically allocate light paths -Applications can change the topology of the wide-area network, possibly even at sub-second timescale Challenge: how to integrate such a network infrastructure with (e-Science) applications? (Collaboration with Cees de Laat, Univ. of Amsterdam)

Conclusions DAS is a shared infrastructure for experimental computer science research It allows controlled (laboratory-like) grid experiments It accelerated the research trend - cluster computing  distributed computing  Grids  Virtual laboratories We want to use DAS as part of larger international grid experiments (e.g. with Grid5000)

Acknowledgements Andy Tanenbaum Bob Hertzberger Henk Sips Lex Wolters Dick Epema Cees de Laat Aad van der Steen Peter Sloot Kees Verstoep Many others More info: