Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Multiple Processor Systems
COMPUTER NETWORK TOPOLOGIES
Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
1 The ns-2 Network Simulator H Plan: –Discuss discrete-event network simulation –Discuss ns-2 simulator in particular –Demonstration and examples: u Download,
Wireless Communication : LAB 3
SE-292 High Performance Computing
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Parallel Programming Motivation and terminology – from ACM/IEEE 2013 curricula.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations David Bauer, Garrett Yaun Christopher Carothers Computer Science Murat Yuksel.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.
Semester 4 - Chapter 3 – WAN Design Routers within WANs are connection points of a network. Routers determine the most appropriate route or path through.
Parallel Implementation of a Biologically Realistic NeoCortical Simulator E.Courtenay Wilson.
1 25\10\2010 Unit-V Connecting LANs Unit – 5 Connecting DevicesConnecting Devices Backbone NetworksBackbone Networks Virtual LANsVirtual LANs.
Damian Gordon.  When we hook up computers together using data communication facilities, we call this a computer network.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
Multiplexer Multiplexing FDM TDM WDM Front-End Processor Controllers.
Data Communications and Networks
What is Concurrent Programming? Maram Bani Younes.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
Network Design Essentials
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
BRITE integration with ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
The Georgia Tech Network Simulator (GTNetS) ECE6110 August 25, 2008 George F. Riley.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Low-Power Wireless Sensor Networks
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Bulk Synchronous Parallel Processing Model Jamie Perkins.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Example: Sorting on Distributed Computing Environment Apr 20,
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
Intro to Network Design
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Introduction Computer networks: – definition – computer networks from the perspectives of users and designers – Evaluation criteria – Some concepts: –
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Distributed Algorithms Dr. Samir Tartir Extracted from Principles of Concurrent and Distributed Programming, Second Edition By M. Ben-Ari.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
These slides are based on the book:
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Parallel and Distributed Simulation
Parallel and Distributed Simulation Techniques
NGS computation services: APIs and Parallel Jobs
CPSC 531: System Modeling and Simulation
Outline Midterm results summary Distributed file systems – continued
Hybrid Programming with OpenMP and MPI
By Brandon, Ben, and Lee Parallel Computing.
Database System Architectures
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011

2 Overview Standard sequential simulation techniques with substantial network traffic –Lengthy execution times –Large amount of computer memory Parallel and distributed discrete event simulation [1] –Allows single simulation program to run on multiple interconnected processors –Reduced execution time! Larger topologies!

3 Overview (cont.) Important Note –It is mandatory that distributed simulations produce the same results as identical sequential simulations

4 Overview: terminology Logical Process (LP) –An individual sequential simulation Rank or system id –The unique number assigned to each LP Figure 1. Simple point-to-point topology, distributed

5 Overview: related work Parallel/Distributed ns (PDNS) [2] Georgia Tech Network Simulator (GTNetS) [3] –Both use a federated approach and a conservative (blocking) mechanism

6 Implementation Details in ns-3 LP communication –Message Passing Interface (MPI) standard –Send/Receive time-stamped messages –MpiInterface in ns-3 Synchronization –Conservative algorithm using lookahead –DistributedSimulator in ns-3

7 Implementation Details in ns-3 (cont.) Assigning rank to nodes –Handled manually in simulation script Remote point-to-point links –Created automatically between nodes with different ranks through point-to-point helper –When a packet is set to cross a remote point-to-point link, the packet is transmitted via MPI using our interface Merged since ns-3.8

8 All nodes created on all LPs, regardless of rank –It is up to the user to only install applications on the correct rank Nodes are assigned rank manually –An MpiHelper class could be used to assign rank to nodes automatically. This would enable easy distribution of existing simulation scripts. Pure distributed wireless is currently not supported –At least one point-to-point link must exist in order to divide the simulation Implementation Details in ns-3: limitations

9 Performance Study DARPA NMS campus network simulation –Using nms-p2p-nix-distributed example available in ns-3 –Allows creation of very large topologies –Any number of campus networks are created and connected together –Different campus networks can be placed on different LPs –Tested with 2 CNs, 4 CNs, 6 CNs, 8 CNs, and 10 CNs

10 Performance Study: campus network topology Figure 2. Campus network topology block [4] 200 ms, 10 us

11 Performance Study: Georgia Tech clusters used Hogwarts Cluster –6 nodes, each with 2 quad-core processors and 48GB of RAM Ferrari Cluster –Mix of machines, including 3 quad-core nodes and 8 dual- core nodes

12 Performance Study: simulations on Hogwarts Figure 3. Campus network simulations on Hogwarts with (A) 2 CNs (B) 4 CNs (C) 6 CNs (D) 8 CNs (E) 10 CNs

13 Performance Study: simulations on Ferrari Figure 4. Campus network simulations on Ferrari with (A) 2 CNs (B) 4 CNs (C) 6 CNs (D) 8 CNs (E) 10 CNs

14 Performance Study: speedup Figure 5. Speedup using distributed simulation for campus network topologies on the (A) Hogwarts cluster and (B) Ferrari cluster

15 Performance Study: speedup (cont.) Linear speedup for Hogwarts, not for Ferrari. Further investigation revealed Ferrari consisted of a mix of machines, with the first two nodes considerably faster 2 CNs4 CNs6 CNs8 CNs10 CNs Hogwarts Ferrari Table 1: Speedup for Hogwarts and Ferrari

16 Performance Study: changing the lookahead By changing the delay between campus networks, the lookahead was varied (200ms to 10 µs) For Hogwarts and Ferrari, the 10 µs simulations ran, on average, 25% and 47% slower, respectively As expected, a smaller lookahead time decreases the potential speedup, as the simulators must synchronize with a greater frequency

17 Future Work MpiHelper class to facilitate creating distributed topologies –Nodes assigned rank automatically –Existing simulation scripts could be distributed easily Distributing the topology could occur at the node level, rather than the application –Ghost nodes, save memory Pure distributed wireless support

18 Summary Distributed simulation in ns-3 allows a user to run a single simulation in parallel on multiple processors Very large-scale simulations can be run in ns-3 using the distributed simulator Distributed simulation in ns-3 offers potentially optimal linear speedup compared to identical sequential simulations

19 References [1]R.M. Fujimoto. Parallel and Distributed Simulation Systems. Wiley Interscience, [2]PDNS - Parallel/Distributed ns. March [3] G. F. Riley. The Georgia Tech Network Simulator. In Proceedings of the ACM SIGCOMM workshop on Models, methods and tools for reproducible network research, MoMeTools ’03, pages 5-12, New York, NY, USA, 2003 ACM. [4] Standard baseline NMS challenge topology. July