Henri Bal Vrije Universiteit Amsterdam

Slides:

Advertisements

Similar presentations

The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam COMMIT/

Advertisements

Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences DAS-1 DAS-2 DAS-3.

Opening Workshop DAS-2 (Distributed ASCI Supercomputer 2) Project vrije Universiteit.

Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.

CCGrid2013 Panel on Clouds Henri Bal Vrije Universiteit Amsterdam.

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,

The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal Vrije Universiteit Amsterdam.

Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Distributed supercomputing on DAS, GridLab, and Grid’5000 Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

Henri Bal Vrije Universiteit Amsterdam vrije Universiteit.

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

GPU Programming: eScience or Engineering? Henri Bal COMMIT/ msterdam Vrije Universiteit.

Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,

The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal, Jason Maassen, Rob van Nieuwpoort, Thilo Kielmann, Niels Drost, Ceriel Jacobs, Frank.

Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.

Grid Adventures on DAS, GridLab and Grid'5000 Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Going Dutch: How to Share a Dedicated Distributed Infrastructure for Computer Science Research Henri Bal Vrije Universiteit Amsterdam.

Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

DAS 1-4: 14 years of experience with the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam.

Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

COMMUNICATION COMMUNICATE COMMUNITY Henri Bal A PUBLIC-PRIVATE RESEARCH COMMUNITY.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.

A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.

ICT infrastructure for Science: e-Science developments Henri Bal Vrije Universiteit Amsterdam.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.

Background Computer System Architectures Computer System Software.

SCARIe: using StarPlane and DAS-3 Paola Grosso Damien Marchel Cees de Laat SNE group - UvA.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.

Unit 2 Technology Systems

Organizations Are Embracing New Opportunities

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

C Loomis (CNRS/LAL) and V. Floros (GRNET)

COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE

Clouds , Grids and Clusters

The Multikernel: A New OS Architecture for Scalable Multicore Systems

Taking a researcher’s e-Infrastructure to the next level

Grid Computing.

Constructing a system with multiple computers or processors

(Parallel and) Distributed Systems Group

Performance Evaluation of Adaptive MPI

Grid Means Business OGF-20, Manchester, May 2007

The Multikernel A new OS architecture for scalable multicore systems

EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.

Department of Computer Science University of California, Santa Barbara

Software Defined Networking (SDN)

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Architectures of distributed systems Fundamental Models

Constructing a system with multiple computers or processors

Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.

Hybrid Programming with OpenMP and MPI

The Performance of Big Data Workloads in Cloud Datacenters

Chapter 4 Multiprocessors

Vrije Universiteit Amsterdam

Department of Computer Science University of California, Santa Barbara

Chapter 1: Introduction

Cluster Computers.

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Henri Bal Vrije Universiteit Amsterdam DAS: A Medium-Scale Distributed System for Computer Science Research - Infrastructure for the Long Term Henri Bal Vrije Universiteit Amsterdam Earlier keynotes on programming environments (Ibis) Inv 2 days after das5 acceptance (19-21 feb) DAS-1 DAS-2 DAS-3 DAS-4 DAS-5

Paper IEEE Computer (May 2016)

Agenda Overview of DAS (1997-2016) Earlier results and impact Unique aspects, the 5 DAS generations, organization Earlier results and impact Example DAS-5 research Projects you may or may not have seen, depending on your age DAS-1 DAS-2 DAS-3 DAS-4 DAS-5

What is DAS? Distributed common infrastructure for Dutch Computer Science Distributed: multiple (4-6) clusters at different locations Common: single formal owner (ASCI), single design team Users have access to entire system Dedicated to CS experiments (like Grid’5000) Interactive (distributed) experiments, low resource utilization Able to modify/break the hardware and systems software Dutch: small scale {no phone calls, no arm twisting} (one password file) G5K allows access to raw hardware, we don’t’

About SIZE Only ~200 nodes in total per DAS generation Less than 1.5 M€ total funding per generation Johan Cruyff: "Ieder nadeel heb zijn voordeel" Every disadvantage has its advantage Original context was soccer

Small is beautiful We have superior wide-area latencies “The Netherlands is a 2×3 msec country” (Cees de Laat, Univ. of Amsterdam) Able to build each DAS generation from scratch Coherent distributed system with clear vision Despite the small scale we achieved: 3 CCGrid SCALE awards, numerous TRECVID awards >100 completed PhD theses {JOKE: TUD speed-of-light} NL is the place to be for efficient wide-area algorithms Can simulate longer latencies Unlike G5K: synchronous (DAS) vs asynchronous (G5K) Mention g5k meeting They: breakable, raw hw, stufy instrument We enable cs research p&dc

DAS generations: visions DAS-1: Wide-area computing (1997) Homogeneous hardware and software DAS-2: Grid computing (2002) Globus middleware DAS-3: Optical Grids (2006) Dedicated 10 Gb/s optical links between all sites DAS-4: Clouds, diversity, green IT (2010) Hardware virtualization, accelerators, energy measurements DAS-5: Harnessing diversity, data-explosion (June 2015) Wide variety of accelerators, larger memories and disks Das-1: metacomputing Das-3 problem changes from what to do with slow WAN into How to use the abundant bandwidth DAS-4: homogeneous core + different types of accelerators

ASCI (1995) Research schools (Dutch product from 1990s), aims: Stimulate top research & collaboration Provide Ph.D. education (courses) ASCI: Advanced School for Computing and Imaging About 100 staff & 100 Ph.D. Students 16 PhD level courses Annual conference ASCI is a real community; I & C marriage, I starts taking over habits, DC, gpus Work together iso writing sla’s {top 98%}

Organization ASCI steering committee for overall design Chaired by Andy Tanenbaum (DAS-1) and Henri Bal (DAS-2 – DAS-5) Representatives from all sites: Dick Epema, Cees de Laat, Cees Snoek, Frank Seinstra, John Romein, Harry Wijshoff Small system administration group coordinated by VU (Kees Verstoep) Simple homogeneous setup reduces admin overhead vrije Universiteit

Historical example (DAS-1) Change OS globally from BSDI Unix to Linux Under directorship of Andy Tanenbaum

Commit/ Financing NWO ``middle-sized equipment’’ program Max 1 M€, tough competition, but scored 5-out-of-5 25% matching by participating sites Going Dutch for ¼ th Extra funding by VU and (DAS-5) COMMIT + NLeSC SURFnet (GigaPort) provides wide-area networks {going ¼ th Dutch} Commit/

Steering Committee algorithm FOR i IN 1 .. 5 DO Develop vision for DAS[i] NWO/M proposal by 1 September [4 months] Receive outcome (accept) [6 months] Detailed spec / EU tender [4-6 months] Selection; order system; delivery [6 months] Research_system := DAS[i]; Education_system := DAS[i-1] (if i>1) Throw away DAS[i-2] (if i>2) Wait (2 or 3 years) DONE CS talk needs algorithms INTEGER 2 or 3 years; overslept Synchronous (BSP like) algorithm, unlike G5K

Output of the algorithm

Earlier results VU: programming distributed systems Clusters, wide area, grid, optical, cloud, accelerators Delft: resource management MultimediaN: multimedia knowledge discovery Amsterdam: wide-area networking, clouds, energy Leiden: data mining, astrophysics Astron: accelerators for signal processing Algorithms for astrophysics vrije Universiteit

DAS-1 (1997-2002) A homogeneous wide-area system 200 MHz Pentium Pro Myrinet interconnect BSDI  Redhat Linux Built by Parsytec VU (128 nodes) Amsterdam (24 nodes) 6 Mb/s ATM Leiden (24 nodes) Delft (24 nodes)

Example: Successive Overrelaxation Problem: nodes at cluster-boundaries Optimizations: Overlap wide-area communication with computation Send boundary row once every X iterations (slower convergence, less communication) 5600 µsec µs 50 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 Cluster 1 Cluster 2

Albatross project Optimize algorithms for wide-area systems Compare: Exploit hierarchical structure  locality optimizations Compare: 1 small cluster (15 nodes) 1 big cluster (60 nodes) wide-area system (4×15 nodes)

Sensitivity to wide-area latency and bandwidth Used local ATM links + delay loops to simulate various latencies and bandwidths [HPCA’99] Applications didn’t care in the least; too early, Grid was not even introduced, they barely used clusters JOKE:simulate <6 Mb, not everyone can have so much bandwidht

Wide-area programming systems Manta: High-performance Java [TOPLAS 2001] MagPIe (Thilo Kielmann): MPI’s collective operations optimized for hierarchical wide-area systems [PPoPP’99] KOALA (TU Delft): Multi-cluster scheduler with support for co-allocation Names MagPIe and KOALA (cf das-1, das-2 etc, boring)

DAS-2 (2002-2006) a Computer Science Grid VU (72) Amsterdam (32) two 1 GHz Pentium-3s Myrinet interconnect Redhat Enterprise Linux Globus 3.2 PBS  Sun Grid Engine Built by IBM SURFnet 1 Gb/s Someone invented the word GRID DAS-1 is BEFORE grids Leiden (32) Delft (32) Utrecht (32)

Grid programming systems Satin (Rob van Nieuwpoort): Transparent divide-and-conquer parallelism for grids Hierarchical computational model fits grids [TOPLAS 2010] Ibis: Java-centric grid computing [IEEE Computer 2010] Efficient and transparent ``Jungle computing’’ JavaGAT: Middleware-independent API for grid applications [SC’07]

Grid experiments Do clean performance measurements on DAS Combine DAS with EU grids to test heterogeneity Show the software ``also works’’ on real grids

DAS-3 (2006-2010) An optical grid SURFnet6 10 Gb/s UvA/MultimediaN (40/46) Dual AMD Opterons 2.2-2.6 GHz Single/dual core nodes Myrinet-10G Scientific Linux 4 Globus, SGE Built by ClusterVision VU (85) SURFnet6 10 Gb/s TU Delft (68) Leiden (32)

Multiple dedicated 10G light paths between sites Idea: dynamically change wide-area topology

DAS-4 (2011) Testbed for Clouds, diversity, green IT UvA/MultimediaN (16/36) Dual quad-core Xeon E5620 Infiniband Various accelerators Scientific Linux Bright Cluster Manager Built by ClusterVision VU (74) SURFnet6 ASTRON (23) 10 Gb/s TU Delft (32) Leiden (16)

Global Climate Modeling Understand future local sea level changes Needs high-resolution simulations Distributed computing (multiple resources) Applications typically want help with EXCEL spreadsheets ! Real computer scientists never ever use EXCEL Commit/

Distributed Computing Use Ibis to couple different simulation models Land, ice, ocean, atmosphere Wide-area optimizations similar to Albatross project (16 years ago), like hierarchical load balancing FORTRAN

Enlighten Your Research Global award STAMPEDE (USA) EMERALD (UK) 10G 10G CARTESIUS (NLD) #7 KRAKEN (USA) SUPERMUC (GER) 10G Joke: use skype instead of telephone #10

DAS-5 (June 2015): Harnessing diversity, data-explosion Same 5 sites as DAS-4 Two new (co-funding) participants Netherlands e-Science Center (NLeSC) COMMIT/ : a public-private research community Commit/

DAS-5 SURFnet7 10 Gb/s UvA/MultimediaN (18/31) VU (68) ASTRON (9) Dual 8-core Intel E5-2630v3 CPUs FDR InfiniBand OpenFlow switches Various accelerators CentOS Linux Bright Cluster Manager Built by ClusterVision UvA/MultimediaN (18/31) VU (68) SURFnet7 ASTRON (9) 10 Gb/s TU Delft (48) Leiden (24)

Highlights of DAS users Awards Grants Top-publications New user-communities

Awards 3 CCGrid SCALE awards Video and image retrieval: 2008: Ibis: ``Grids as promised’’ 2010: WebPIE: web-scale reasoning 2014: BitTorrent analysis Video and image retrieval: 5 TRECVID awards, ImageCLEF, ImageNet, Pascal VOC classification, AAAI 2007 most visionary research award Euro-Par 2014 achievement award

Netherlands Award for ICT Research 2016 (Alexandru Iosup)

Grants Externally funded PhD/postdoc projects using DAS: Major incentive for VL-e (20 M€ funding) Virtual Laboratory for e-Science (2003-2009) Strong participation in COMMIT/ 100 completed PhD theses DAS-3 proposal 20 DAS-4 proposal 30 DAS-5 proposal 50

Top papers IEEE Computer 5 Comm. ACM 2 IEEE TPDS 7 ACM TOPLAS 3 ACM TOCS 4 Nature

SIGOPS 2000 paper 50 authors 130 citations Coherent paper with 50 authors? 130 citations

New user-communities DAS is totally unattractive for applications research It comes nowhere near the TOP-500 It cannot be used for production runs during day-time Still, DAS keeps attracting new communities DAS-3: multimedia; DAS-4: astronomy; DAS-5: eScience Reason: DAS is useful as stepping stone Easy and fast experiments with parallel algorithms Distributed experiments Experiments with new hardware (accelerators, networks)

Agenda Overview of DAS (1997-2016) Earlier results and impact Unique aspects, the 5 DAS generations, organization Earlier results and impact Example DAS-5 research Projects you may or may not have seen, depending on your age DAS-1 DAS-2 DAS-3 DAS-4 DAS-5

Graphics Processing Units (GPUs)

Differences CPUs and GPUs CPU: minimize latency of 1 activity (thread) Must be good at everything Big on-chip caches Sophisticated control logic GPU: maximize throughput of all threads using large-scale parallelism Control ALU Cache You may ask “Why are these design decisions different from a CPU?” In fact, the GPU’s goals differ significantly from the CPU’s GPU evolved to solve problems on a highly parallel workload CPU evolved to be good at any problem whether it is parallel or not For example, the trip out to memory is long and painful The question for the chip architect: How to deal with latency? One way is to avoid it: the CPU’s computational logic sits in a sea of cache. The idea is that few memory transactions are unique, and the data is usually found quickly after a short trip to the cache. Another way is amortization: GPUs forgo cache in favor of parallelization. Here, the idea is that most memory transactions are unique, and can be processed efficiently in parallel. The cost of a trip to memory is amortized across several independent threads, which results in high throughput. 42

You may ask “Why are these design decisions different from a CPU?” In fact, the GPU’s goals differ significantly from the CPU’s GPU evolved to solve problems on a highly parallel workload CPU evolved to be good at any problem whether it is parallel or not For example, the trip out to memory is long and painful The question for the chip architect: How to deal with latency? One way is to avoid it: the CPU’s computational logic sits in a sea of cache. The idea is that few memory transactions are unique, and the data is usually found quickly after a short trip to the cache. Another way is amortization: GPUs forgo cache in favor of parallelization. Here, the idea is that most memory transactions are unique, and can be processed efficiently in parallel. The cost of a trip to memory is amortized across several independent threads, which results in high throughput. 43

Why is GPU programming hard? Tension between control over hardware to achieve performance higher abstraction level to ease programming Programmers need understandable performance

Programming methodology: stepwise refinement for performance Programmers can work on multiple levels of abstraction Integrate hardware descriptions into programming model Performance feedback from compiler, based on hardware description and kernel Cooperation between compiler and programmer

MCL: Many-Core Levels MCL program is an algorithm mapped to hardware Start at a suitable abstraction level Idealized accelerator, NVIDIA Kepler GPU, Xeon Phi MCL compiler guides programmer which optimizations to apply on given abstraction level or to move to deeper levels

MCL ecosystem

Lessons learned from MCL The stepwise-refinement for performance methodology supports a structured approach Gradually expose programmers to more hardware details It enables to develop optimized many-core kernels Key ideas: stepwise refinement + multiple abstraction levels + compiler feedback To be continued in new eScience project Natural Language Processing Bioinformatics

Discussion on Infrastructure Computer Science needs its own infrastructure for interactive experiments DAS was instrumental for numerous projects Being organized helps ASCI is a (distributed) community Having a vision for each generation helps Updated every 4-5 years, in line with research agendas

Interacting with applications Used DAS as stepping stone for applications Small experiments, no productions runs Applications really helped the CS research DAS-3: multimedia → Ibis applications, awards DAS-4: astronomy → many new GPU projects DAS-5: eScience Center → EYR-G award, GPU work Many plans for new collaborations Data science Business analytics {not even ``idle cycles’’}

Conclusions DAS has had a major impact on Dutch Computer Science for almost two decades It aims at coherence + modest system size Highly effective and small investment E.g. It avoids fragmented systems management Central organization and coherence are key

Acknowledgements Commit/ Funding: Support: TUD/GIS Stratix ASCI office DAS grandfathers: Andy Tanenbaum Bob Hertzberger Henk Sips More information: http://www.cs.vu.nl/das5/ DAS Steering Group: Dick Epema Cees de Laat Cees Snoek Frank Seinstra John Romein Harry Wijshoff System management: Kees Verstoep et al. Hundreds of users Funding: Commit/