Parallel and Distributed Simulation FDK Software.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Distributed Processing, Client/Server and Clusters
MPI Message Passing Interface
Time Management in the High Level Architecture Roger McFarlane School of Computer Science McGill University Montréal, Québec CANADA 19 March 2003.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Parallel and Distributed Simulation Time Warp: Other Mechanisms.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Time Management in the High Level Architecture. Outline Overview of time management services Time constrained and time regulating federates Related object.
PZ13B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13B - Client server computing Programming Language.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Chapter 11: Distributed Processing Parallel programming Principles of parallel programming languages Concurrent execution –Programming constructs –Guarded.
Parallel and Distributed Simulation Object-Oriented Simulation.
Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.
1 I/O Management in Representative Operating Systems.
Chapter 26 Client Server Interaction Communication across a computer network requires a pair of application programs to cooperate. One application on one.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Lecture 6: Introduction to Distributed Computing.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Synchronous Algorithms I Barrier Synchronizations and Computing LBTS.
Distributed Shared Memory Systems and Programming
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
CH2 System models.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
DCE (distributed computing environment) DCE (distributed computing environment)
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
Parallel and Distributed Simulation Synchronizing Wallclock Time.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
CSC 600 Internetworking with TCP/IP Unit 7: IPv6 (ch. 33) Dr. Cheer-Sun Yang Spring 2001.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Parallel and Distributed Simulation Distributed Virtual Environments (DVE) & Software Introduction.
Client-Server Model of Interaction Chapter 20. We have looked at the details of TCP/IP Protocols Protocols Router architecture Router architecture Now.
Distributed Virtual Environment and Simulation Package Stephen Lawrence
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
1 Client-Server Interaction. 2 Functionality Transport layer and layers below –Basic communication –Reliability Application layer –Abstractions Files.
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
High Level Architecture Time Management. Time management is a difficult subject There is no real time management in DIS (usually); things happen as packets.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Pony – The occam-π Network Environment A Unified Model for Inter- and Intra-processor Concurrency Mario Schweigler Computing Laboratory, University of.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
PDES Introduction The Time Warp Mechanism
Distributed Shared Memory
Parallel and Distributed Simulation
Definition of Distributed System
Last Class: RPCs and RMI
High Level Architecture
PDES: Time Warp Mechanism Computing Global Virtual Time
Lecture 7: Introduction to Distributed Computing.
Parallel and Distributed Simulation
Uniprocessor scheduling
Update : about 8~16% are writes
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

Parallel and Distributed Simulation FDK Software

Outline FDK Overview RTI-Kit Software –BRTI and DRTI –Group Communication: MCAST –Time Management: TM-Kit

Federated Simulations Development Kit (FDK) RTI-Kit: software for experimental research in RTIs Jane: interactive simulation monitoring and control Jane GUI Remote User Compute Server

RTI-Kit collection of libraries; enhance existing RTIs / develop new ones each library can be used separately, or with other RTI-Kit libraries implemented over multiple platforms: compute clusters (Myrinet), shared memory multiprocessors (SGI), IP networks current libraries –MCAST: group communication software distributed group management, name server functions current implementation built over unicast application-defined buffer allocation to minimize data copying –TM-Kit: algorithms for implementing time management fast distributed snapshot algorithm scalable (O(log N) time for global reduction operations –buffer management, priority queues, random number libraries RTIs using RTI-Kit –UK-RTI (DERA) –B-RTI, D-RTI

RTI-Kit Software other libraries: buffer management priority queues, etc. B-RTI: Simple interface MCAST (group communication) TM-Kit (time management algorithms) Physical network FM-Lib Comm Libraries (use one) RTI-Kit libraries RTI Interface (use one) Shared memoryIP protocolsMyrinet DDM-Kit* (data distribution software) D-RTI: HLA interface federates * not included in current release

RTIs BRTI –Simple example RTI –Does not attempt to conform to HLA IF/Spec DRTI –Similar functionality to BRTI, services conform to v. 1.3 HLA IF/Spec –Limit set of services (federation management, declaration management, object management, time management)

MCAST MCAST: group communication software distributed group management, name server functions current implementation built over unicast cache group membership information application-defined buffer allocation eliminates data copying within MCAST

MCAST API Group Management MCAST_Create: create group MCAST_GetHandle: obtain handle for group from name server MCAST_Subscribe: join group MCAST_Unsubscribe: leave group Sending / Receiving Messages MCAST_Send: send message WhereProc: callback invoked for each incoming message (typically to allocate memory for incoming message) MessageHandler: callback invoked for each subscriber for each incoming message EndProc: optional callback invoked after all message handlers have been called

One Way Message Latency (round trip latency / 2)

Update Attribute Value Latency Benchmark Myrinet and Shared Memory similar performance Mryinet and Shared Memory about 4 to 9 times faster than TCP/IP Federate to federate latency using RTI-Kit sending N byte payload

TM-Kit Compute each processor’s Lower Bound on Time Stamp (LBTS) of future messages that could later arrive LBTS computation –Based on distributed snapshot algorithm Similar to Mattern’s GVT algorithm –Butterfly network for reduction computation and distribution of results –Wait until all transient messages received before reporting final LBTS value

HLA Time Management LBTS i : lower bound on time stamp of messages that could later arrive for federate i TSO messages w/ TS ≤  LBTS i eligible for delivery RTI ensures logical time of federate i never exceeds LBTS i deliver messages to federate in time stamp order ensure federate does not receive an event in its past Federate i RTI i... network... TSO queue LBTS=10

TM-Kit Implementation any processor can start an LBTS computation at any time callbacks to RTI software used to –Indicate another processor has started an LBTS computation –LBTS computation has completed simultaneous initiations by more than one processor automatically combined multiple pending LBTS computations allowed: a processor can initiate a new computation even if previously initiated one(s) pending allows variety of methods to initiate LBTS computations –each processor starts a new computation “as needed” (done here) –central controller can responsible for starting LBTS computations

TM-Kit API TM_StartLBTS: initiate an LBTS computation StartHandler: callback invoked when LBTS computation initated by another processor is detected DoneHandler: callback invoked when LBTS computation completed Transient Messages –TM_PutTag: called prior to sending message to write tag (color) into message –TM_Out: called after message (event) is sent –TM_In: called when message (event) is received