Virtualization in MetaSystems Vaidy Sunderam Emory University, Atlanta, USA

Slides:



Advertisements
Similar presentations
Institute of Computer Science AGH Towards Multilanguage and Multiprotocol Interoperability: Experiments with Babel and RMIX Maciej Malawski, Daniel Harężlak,
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
RPC Robert Grimm New York University Remote Procedure Calls.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Harness and H2O Alternative approaches to metacomputing Distributed Computing Laboratory Emory University, Atlanta, USA
Distributed components
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
Technical Architectures
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Institute of Computer Science AGH MOCCA - H2O-based CCA component framework for programming grids and metacomputing systems Maciej Malawski, Marian Bubak,
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
2 Systems Architecture, Fifth Edition Chapter Goals Describe client/server and multi-tier application architecture and discuss their advantages compared.
DISTRIBUTED COMPUTING
Web Services Michael Smith Alex Feldman. What is a Web Service? A Web service is a message-oriented software system designed to support inter-operable.
Enterprise Resource Planning
Ch4: Distributed Systems Architectures. Typically, system with several interconnected computers that do not share clock or memory. Motivation: tie together.
Computer System Architectures Computer System Software
Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Institute of Computer Science AGH MOCCA – A Distributed CCA Framework based on H2O Maciej Malawski, Dawid Kurzyniec, Vaidy Sunderam Distributed Computing.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
1 G52IWS: Distributed Computing Chris Greenhalgh.
DISTRIBUTED COMPUTING
Architecting Web Services Unit – II – PART - III.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Grid Computing Research Lab SUNY Binghamton 1 XCAT-C++: A High Performance Distributed CCA Framework Madhu Govindaraju.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Client Call Back Client Call Back is useful for multiple clients to keep up to date about changes on the server Example: One auction server and several.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
VMware vSphere Configuration and Management v6
Jini Architecture Introduction System Overview An Example.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Enabling Components Management and Dynamic Execution Semantic.
Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Jini Architectural Overview Li Ping
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Full and Para Virtualization
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Seminar on Service Oriented Architecture Distributed Systems Architectural Models From Coulouris, 5 th Ed. SOA Seminar Coulouris 5Ed.1.
REST By: Vishwanath Vineet.
1 ProActive GCM – CCA Interoperability Maciej Malawski, Ludovic Henrio, Matthieu Morel, Francoise Baude, Denis Caromel, Marian Bubak Institute of Computer.
Presented by The Harness Workbench: Unified and Adaptive Access to Diverse HPC Platforms Christian Engelmann Computer Science Research Group Computer Science.
Background Computer System Architectures Computer System Software.
Distributed Computing & Embedded Systems Chapter 4: Remote Method Invocation Dr. Umair Ali Khan.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
By Jeremy Burdette & Daniel Gottlieb. It is an architecture It is not a technology May not fit all businesses “Service” doesn’t mean Web Service It is.
Object Interaction: RMI and RPC 1. Overview 2 Distributed applications programming - distributed objects model - RMI, invocation semantics - RPC Products.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
SDN controllers App Network elements has two components: OpenFlow client, forwarding hardware with flow tables. The SDN controller must implement the network.
Java Distributed Object System
Architecting Web Services
Architecting Web Services
Distribution and components
Grid Computing.
University of Technology
Inventory of Distributed Computing Concepts
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

Virtualization in MetaSystems Vaidy Sunderam Emory University, Atlanta, USA

Credits and Acknowledgements Distributed Computing Laboratory, Emory University Dawid Kurzyniec, Piotr Wendykier, David DeWolfs, Dirk Gorissen, Maciej Malawski, Vaidy Sunderam Collaborators Oak Ridge Labs (A. Geist, C. Engelmann, J. Kohl) Univ. Tennessee (J. Dongarra, G. Fagg, E. Gabriel) Sponsors U. S. Department of Energy National Science Foundation Emory University

Virtualization Fundamental and universal concept in CS, but receiving renewed, explicit recognition Machine level Single OS image: Virtuozo, Vservers, Zones Full virtualization: VMware, VirtualPC, QEMU Para-virtualization: UML, Xen (Ian Pratt et. al, cl.cam.uk)  “Consolidate under-utilized resources, avoid downtime, load- balancing, enforce security policy” Parallel distributed computing Software systems: PVM, MPICH, grid toolkits and systems  Consolidate under-utilized resources, avoid downtime, load- balancing, enforce security policy + aggregate resources

Virtualization in PVM Historical perspective – PVM 1.0, 1989

Key PVM Abstractions Programming model Timeshared, multiprogrammed virtual machine Two-level process space  Functional name + ordinal number Flat, open, reliable messaging substrate  Heterogeneous messages and data representation Multiprocessor emulation Processor/process decoupling Dynamic addition/deletion of processors Raw nodes projected  Transparently  Or with exposure of heterogeneous attributes

Parallel Distributed Computing Multiprocessor systems Parallel distributed memory computing Stable and mainstream: SPMD, MPI Issues relatively clear: performance Platforms Applications  Correspondingly tightly coupled

Parallel Distributed Computing Metacomputing and grids Platforms Parallelism  Possibly within components, but mostly loose concurrency or pipelining between components (PVM: 2-level model) Grids: resource virtualization across multiple admin domain  Moved to explicit focus on service orientation  “Wrap applications as services, compose applications into workflows”; deploy on service oriented infrastructure Motivation: service/resource coupling  Provider provides resource and service; virtualized access

Virtualization in PDC What can/should be virtualized? Raw resource  CPU : process/task instantiation => staging, security etc  Storage : e.g. network file system over GMail  Data : value added or processed Service  Define interface and input-output behavior  Service provider must operate the service Communication  Interaction paradigm with strong/adequate semantics Key capability: Configurable/reconfigurable resource, service, and communication

The Harness II Project Theme Virtualized abstractions for critical aspects of parallel distributed computing implemented as pluggable modules, (including programming systems) Major project components Fault-tolerant MPI: specification, libraries Container/component infrastructure: C-kernel, H2O Communication framework: RMIX Programming systems:  FT-MPI + H2O, MOCCA (CCA + H2O), PVM

DVM-enabling components Virtual layer Harness II Provider B Provider A Provider C Cooperating users FT-MPI PVM Comp. Active objects... Applications App 1App 2 Programming model Aggregation for Concurrent High Performance Computing Hosting layer Collection of H2O kernels Flexible/lightweight middleware Equivalent to Distributed Virtual Machine But only on client side DVM pluglets responsible for (Co) allocation/brokering Naming/discovery Failures/migration/persistence Programming environments: FT- MPI, CCA, paradigm frameworks, distributed numerical libraries

H2O Middleware Abstraction Providers own resources Independently make them available over the network Clients discover, locate, and utilize resources Resource sharing occurs between single provider and single client Relationships may be tailored as appropriate Including identity formats, resource allocation, compensation agreements Clients can themselves be providers Cascading pairwise relationships may be formed Network Providers Clients

H2O Framework Resources provided as services Service = active software component exposing functionality of the resource May represent „added value” Run within a provider’s container (execution context) May be deployed by any authorized party: provider, client, or third-party reseller Provider specifies policies  Authentication/authorization  Actors  kernel/pluglet Decoupling Providers/providers/clients Container Provider host Deploy Lookup & use Provider Client > B A Provider > A B Container Lookup & use Client Deploy Provider, Client, or Reseller Provider host Traditional model H2O model

Example usage scenarios n Resource = computational service n Reseller deploys software component into provider’s container n Reseller notifies the client about the offered computational service n Client utilizes the service n Resource = raw CPU power n Client gathers application components n Client deploys components into providers’ containers n Client executes distributed application utilizing providers’ CPU power n Resource = legacy application n Provider deploys the service n Provider stores the information about the service in a registry n Client discovers the service n Client accesses legacy application through the service

Model and Implementation H2O nomenclature container = kernel component = pluglet Object-oriented model, Java and C-based implementations Pluglet = remotely accessible object Must implement Pluglet interface, may implement Suspendible interface Used by kernel to signal/trigger pluglet state changes Model Implement (or wrap) service as a pluglet to be deployed on kernel(s) Pluglet Functional interfaces Kernel Clients [Suspendible] Interface Pluglet { void init(ExecutionContext cxt); void start(); void stop(); void destroy(); } Interface Suspendible { void suspend(); void resume(); } Interface StockQuote { double getStockQuote(); } (e.g. StockQuote)

Accessing Virtualized Services Request-response ideally suited, but Stateful service access must be supported Efficiency issues, concurrent access Asynchronous access for compute intensive service Semantics of cancellation and error handling Many approaches focus on performance alone and ignore semantic issues Solution Enhanced procedure call/method invocation Well understood paradigm, extend to be more appropriate to access metacomputing services

The RMIX layer H2O built on top of RMIX communication substrate Provides flexible p2p communication layer for H2O applications Enable various message layer protocols within a single, provider-based framework library Adopting common RMI semantics Enable high performance and interoperability Easy porting between protocols, dynamic protocol negotiation Offer flexible communication model, but retain RMI simplicity Extended with: asynchronous and one-way calls  Issues: Consistency, Ordering, Exceptions, Cancellation RPC clients Web Services SOAP clients... Java H2O kernel A C B EFD RMIX Networking RMIX Networking RPC, IIOP, JRMP, SOAP, …

RMIX Overview Extensible RMI framework Client and provider APIs uniform access to communication capabilities supplied by pluggable provider implementations Multiple protocols supported JRMPX, ONC-RPC, SOAP Configurable and flexible Protocol switching Asynchronous invocation ONC-RPC Web Services SOAP clients GM RMIX RMIX XSOAP RMIX RPCX RMIX Myri RMIX JRMPX Java Service Access

RMIX Abstractions Uniform interface and API Protocol switching Protocol negotiation Various protocol stacks for different situations  SOAP: interoperability  SSL: security  ARPC, custom (Myrinet, Quadrics): efficiency Harness Kernel Internet security firewall efficiency H2O Pluglet Client or Server H2O Pluglet Client or Server H2O Pluglet Client or Server H2O Pluglet Client or Server Asynchronous access to virtualized remote resources

Parameter marshalling Data consistency Also in PVM, MPI etc Exceptions/cancellation Critical for stateful servers Conservative vs. best effort Other issues Execution order Security Virtualizing communications Performance/familiarity vs. semantic issues :stub :param create() asyncCall() modify() read() Asynchronous RMIX :stub “started” :target “completed” ClientServer Disregard At Client-Side Interrupt Client I/O Disregard At Server-Side Interrupt Server Thread Interrupt Server I/O Ignore Result Reset server state Result Delivery Result Unmarshalling Parameter Marshalling Parameter Unmarshalling Result Marshalling Method Call Call Initiation Cancellation at various stages of the call

Programming Models: CCA and H2O Common Component Architecture Component standard for HPC Uses and provides ports described in SIDL Support for scientific data types Existing tightly coupled (CCAFFEINE) and loosely coupled, distributed (XCAT) frameworks H2O Well matched to CCA model

MOCCA implementation in H2O Each component running in separate pluglet Thanks to H2O kernel security mechanisms, multiple components may run without interfering Two-level builder hierarchy ComponentID: pluglet URI MOCCA_Light: pure Java implementation (no SIDL)

Performance: Small Data Packets Factors: SOAP header overhead in XCAT Connection pools in RMIX

Large Data Packets Encoding (binary vs. base64) CPU saturation on Gigabit LAN (serialization) Variance caused by Java garbage collection

Use Case 2: H2O + FT-MPI Overall scheme: H2O framework installed on computational nodes, or cluster front-ends Pluglet for startup, event notification, node discovery FT-MPI native communication (also MPICH) Major value added FT-MPI need not be installed anywhere on computing nodes To be staged just-in-time before program execution Likewise, application binaries and data need not be present on computing nodes The system must be able to stage them in a secure manner

Staging FT-MPI runtime with H2O FT-MPI runtime library and daemons Staged from a repository (e.g. Web server) to the computational node upon user’s request Automatic platform type detection; appropriate binary files are downloaded from the repository as needed Allows users to run fault tolerant MPI programs on machines where FT-MPI is not pre-installed Not needing login account to do so: using H2O credentials instead

Launching FT-MPI applications with H2O Staging applications from a network repository Uses URL code base to refer to a remotely stored application Platform-specific binary transparently uploaded to a computational node upon client request Separation of roles Application developer bundles the application and puts it into a repository The end-user launches the application, unaware of heterogeneity

Interconnecting heterogeneous clusters Private, non-routable networks Communication proxies on cluster front-ends route data streams Local (intra-cluster) channels not affected Nodes use virtual addresses at the IP level; resolved by the proxy

Initial experimental results Proxied connection versus direct connection Standard FT-MPI throughput benchmark was used within a Gig-Ethernet cluster: proxies retain 65% of throughput

Summary Virtualization in PDC Devising appropriate abstractions Balance pragmatics and performance vs. model cleanness The Harness II Project H2O kernel  Reconfigurability, by clients/tpr’s very valuable RMIX communications framework  High level abstractions for control comms (native data comms) Multiple programming model overlays  CCA, FT-MPI, PVM  Concurrent computing environments on demand