Simulation, Emulation Sathish Vadhiyar Sources / Credits: Microgrid, Simgrid.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Dynamic Scheduling of Network Updates Xin Jin Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, Roger Wattenhofer.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
BY PAYEL BANDYOPADYAY WHAT AM I GOING TO DEAL ABOUT? WHAT IS AN AD-HOC NETWORK? That doesn't depend on any infrastructure (eg. Access points, routers)
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Resource Management – a Solution for Providing QoS over IP Tudor Dumitraş, Frances Jen-Fung Ning and Humayun Latif.
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Parallel Simulation etc Roger Curry Presentation on Load Balancing.
OSMOSIS Final Presentation. Introduction Osmosis System Scalable, distributed system. Many-to-many publisher-subscriber real time sensor data streams,
Chapter 11 Operating Systems
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.
1 Relates to Lab 4. This module covers link state routing and the Open Shortest Path First (OSPF) routing protocol. Dynamic Routing Protocols II OSPF.
EstiNet Network Simulator & Emulator 2014/06/ 尉遲仲涵.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
The MicroGrid: A Scientific Tool for Modeling Grids Andrew A. Chien SAIC Chair Professor Department of Computer Science and Engineering University of California,
Network Aware Resource Allocation in Distributed Clouds.
“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
Salim Hariri HPDC Laboratory Enhanced General Switch Management Protocol Salim Hariri Department of Electrical and Computer.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Paper # – 2009 A Comparison of Heterogeneous Video Multicast schemes: Layered encoding or Stream Replication Authors: Taehyun Kim and Mostafa H.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Databases Illuminated
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Vishal Jain, AntNet Agent Based Strategy for CMDR “Agent Based Multiple Destination Routing ”
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Static Process Scheduling
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1 TCOM 5143 Lecture 10 Centralized Networks: Time Delay and Cost Tradeoffs.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Background Computer System Architectures Computer System Software.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
MicroGrid Update & A Synthetic Grid Resource Generator Xin Liu, Yang-suk Kee, Andrew Chien Department of Computer Science and Engineering Center for Networked.
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
Clouds , Grids and Clusters
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Dynamic Graph Partitioning Algorithm
NOX: Towards an Operating System for Networks
What Are Routers? Routers are an intermediate system at the network layer that is used to connect networks together based on a common network layer protocol.
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Software Defined Networking (SDN)
DDoS Attack Detection under SDN Context
Networked Real-Time Systems: Routing and Scheduling
Virtual Memory: Working Sets
Towards Predictable Datacenter Networks
Presentation transcript:

Simulation, Emulation Sathish Vadhiyar Sources / Credits: Microgrid, Simgrid

Importance Needed for characterizing behavior of Grid systems in the future During development period, to test methodologies under repeatable conditions For simulating “what if” scenarios Needed when there is no real grid. Needed in India

MicroGrid Enables systematic design and evaluation of middleware, applications, and network services for computational Grid. Provides an environment for scientific and repeatable experiments. Microgrid can also predict performance on futuristic and fictional topologies Features Enables use of Globus applications without change by virtualizing execution environment providing the illusion of virtual Grid. Enables use of Globus applications without change by virtualizing execution environment providing the illusion of virtual Grid. Uses global virtual time to preserve simulation accuracy Uses global virtual time to preserve simulation accuracy Provides basic resource simulation models for computing, memory and networking Provides basic resource simulation models for computing, memory and networking

Virtualizing resources Uses mapping table for mapping from virtual IP address to physical IP address Intercepts relevant library calls Gethostbyname Gethostbyname Bind, send, receive Bind, send, receive Process creation – process created through Globus resource management functions Process creation – process created through Globus resource management functions User will be logged in directly to a physical host and submit jobs to virtual hosts Globus gatekeeper, job managers and client hosts run on virtual hosts All socket interfaces and information services are also virtualized

Global Coordination Simulation Rate – rate at which simulator runs. How much of real cpu is simulator using. Minimum feasible simulation rate depending on desired virtual resources and actual capacities of physical resources Minimum value of SR over all resources – fastest rate at which simulation can be run in a functionally correct manner

Simulation Rate Examples Given physical = 1 GHz, virtual = 2 GHz, simulation rate cannot be less than 2. Otherwise you will be guaranteeing more than 100% CPU usage ! Given physical = 2 GHz, virtual = 1 GHz, simulation rate cannot be less than 0.5. Same argument.

More Another parameter (say x) that determines how fast time progresses in the application Greater the value, faster the time progresses in the application Calls like gettimeofday and select use these parameters to return appropriate adjusted times Thus virtual cpu twice the speed of real cpu, simulation rate = 2, and x =2 will give ½ the time for a code fragment

Resource Simulation

Simulation rate is divided equally across all processes executing on the physical host The resulting fractions are then enforced by local MicroGrid CPU scheduler It is a scheduler daemon using signals to allocate local physical CPU capacity to local MicroGrid tasks

How to ensure CPU usage Naïve strategy - Calculate usage for procs. on virtual machine. Give all procs. the same usage. E.g. if (virtual / physical) is 25% and 2 procs. running on virtual machine, assign each process 10 milliseconds every 80 milliseconds. Not good An application process should always be ready to run if it has not used its available CPU slots An application process should always be ready to run if it has not used its available CPU slots A computation intensive process should be able to fully utilize the quota for virtual machine A computation intensive process should be able to fully utilize the quota for virtual machine

MicroGrid CPU Controller Each CPU controller on each physical host Uses SIGSTOP and SIGCONT to stop and continue processes Consists of 3 parts Live process interception – whenever a virtual process is created or destroyed on microgrid using main() or exit(), CPU controller traps it and updates its process table Live process interception – whenever a virtual process is created or destroyed on microgrid using main() or exit(), CPU controller traps it and updates its process table CPU usage monitoring – every sliding window, the controller reads CPU usage from /proc of processes in its process table CPU usage monitoring – every sliding window, the controller reads CPU usage from /proc of processes in its process table Process scheduling – the controller calculates CPU usage of each virtual host in a time window. If the amount of effective cycles exceed the speed of the virtual hosts, the controller sends SIGSTOP to all processes of the virtual hosts, otherwise, it wakes up processes and let them proceed Process scheduling – the controller calculates CPU usage of each virtual host in a time window. If the amount of effective cycles exceed the speed of the virtual hosts, the controller sends SIGSTOP to all processes of the virtual hosts, otherwise, it wakes up processes and let them proceed

CPU Controller

Determining sliding window size E - design accuracy error p - scaled virtual machine speed (fraction of physical CPU) w - the sliding window size in jiffies n - the available jiffies in a sliding window n should satisfy: w = round(n/p) and | 1 - n/(p*w) | < E Find the smallest n that satisfies equation | 1 - (n/p)/round(n/p) | < E, then find w.

Example Real machine – 1 GHz Virtual machine – 600 MHz Simulation rate – 2 E – 0.05 p = 600/1000 = 60%, with simulation rate 2, it is 30% real cpu Smallest n that satisfies | 1 - (10n/3) / round(10n/3) | < 0.05 Try n= 1,2,3… Here, n = 2 w = 7

Network Simulation Based on MaSSF – a scalable packet-level network simulator that supports direct execution of unmodified application Uses a distributed simulation engine Can model many kinds of network protocols including TCP/IP, UDP, user-defined protocols etc. Intercepts live network streams at the socket level using wrapper library called WrapSocket

Live traffic interception

Scalability Given a network topology and available cluster nodes, MaSSF partitions the virtual network to multiple blocks and assigns each block to a cluster node Every cluster node runs a discrete event simulation engine Events are exchanged among simulation engines. Cluster nodes also needs to synchronize periodically. Involves traffic

Scalability Hence network mapping has to be done carefully to minimize communication of simulation events between simulation engine nodes and to achieve load balance across partitions Network mapping problem modeled as graph partitioning problem – can estimate the number of simulation events on each single link and use it to calculate edge weight.

Improving scalability Graph partitioning for network mapping problem Input graph – traffic information (defines edge weights), network structure Input graph – traffic information (defines edge weights), network structure Constraints – weighted sum of computation and memory requirement on each simulation engine node (vertex weight) to be balanced among multiple vertices Constraints – weighted sum of computation and memory requirement on each simulation engine node (vertex weight) to be balanced among multiple vertices Objectives – communication across partitions (edge-cut) to be minimized Objectives – communication across partitions (edge-cut) to be minimized Partitioned network defines the mapping of simulated network nodes to physical resources

Real applications on MicroGrid - Lot more to do…

SimGrid You know it

References / Sources / Credits Validating and Scaling the MicroGrid: A Scientific Instrument for Grid Dynamics, Xin Liu, Huaxia Xia, and Andrew Chien, to appear in the Journal of Grid Computing. The MicroGrid: a Scientific Tool for Modeling Computational Grids, in Proceedings of SC2000 (Song, Liu, Jakobsen, Bhagwan, Zhang, Taura and Chien) The MicroGrid: a Scientific Tool for Modeling Computational Grids, in Proceedings of SC2000 (Song, Liu, Jakobsen, Bhagwan, Zhang, Taura and Chien) Simgrid: A Toolkit for the Simulation of Application Scheduling. CCGrid 01

JUNK!

Calls Setting up the simulated application and computation environment Setting up the simulated application and computation environment Simulating the application execution once the tasks have been assigned to resources – SG_simulate Simulating the application execution once the tasks have been assigned to resources – SG_simulate Scheduling algorithms Scheduling algorithms Based on performance prediction – SG_getPrediction Implementation of scheduling decision – SG_scheduleTaskOnResource Also supports runtime scheduling algorithms. Control must be returned from SG_simulate to scheduling algorithm itself. For work queue control is returned after each task completes. For others, user can specify how long a simulation should run before control is returned. SG_unscheduleTask can be used to modify scheduling decisions for tasks. Many API calls help the user to keep track of past scheduling decisions.

SG_getclock returns virtual global time Can do post mortem analysis with the help of resource usage and start and end times and compute various metrics and how the simulation behaved

SimGrid-2 paper Simulations allow Repeatable experiments Repeatable experiments To explore wide range of application and resource scenarios To explore wide range of application and resource scenariosSimgrid For developing and evaluating scheduling algorithms For developing and evaluating scheduling algorithms Objectives – good usability, fast simulations, configurable, tunable and extensible simulations, scalable Objectives – good usability, fast simulations, configurable, tunable and extensible simulations, scalable Aim towards simulation standardization Aim towards simulation standardization

Simgrid components Agent – implements scheduling algorithm, contains code, private data and location Location – where agent runs, defined by location, mail boxes for communicating with other agents and private data Task – defined by amount of computing, data size, private data Path – routing abstractions Channel – abstraction representing communication between agents

Simulation program steps Definition of code for each agent Modeling application Modeling application Done with MSG_Task_Get, MSG_Task_Put, MSG_Task_Execute Done with MSG_Task_Get, MSG_Task_Put, MSG_Task_Execute Creation of resources Modeling the physical platform Modeling the physical platform Hosts, links, routing table paths Hosts, links, routing table paths MSG_host_create, MSG_link_create, MSG_routing_table_set MSG_host_create, MSG_link_create, MSG_routing_table_set Creation and allocation of agents to locations Application deployment Application deployment MSG_process_create MSG_process_create Starting simulation MSG_main MSG_main

Resource sharing is supported by SimGrid by supporting different models FIFO FIFO FRFO FRFO SHARED – fair sharing or priority-based sharing SHARED – fair sharing or priority-based sharingChallenges Users to construct large simulated platforms Users to construct large simulated platforms To simulate the complex network contention behaviors of applications executing on these platforms To simulate the complex network contention behaviors of applications executing on these platforms

Modeling grid topologies Simgrid allows users to import platform descriptions obtained with Effective Network View (ENV). Thus SimGrid uses ENV and NWS to instantiate platform models which represent realistic platforms both in terms of topology and in terms of traffic.

Bandwidth sharing models Algorithm first considers all bottleneck links and flows on these links Assigns a bandwidth to flows on these links inversely proportional to their rtts. Algorithm reduces bandwidths on the links traversed by these flows Process repeated until bandwidths assigned to all flows Simgrid makes it possible to define two types of links: those where bandwidth is shared and those where bandwidth is not shared Good for modeling grid computing topology where local networks connected by a shared backbone

GridSim Individual resource brokers and central schedulers

Simjava Simulations in Simjava contain a number of entities each running as own threads Entities call simulation functions (sim_schedule, sim_hold, sim_wait) and events are generated.

Every event has source entity and destination entity

NPB with MicroGrid

Scheduling quanta length and Modeling Accuracy

Internal Performance NPB run on real Alpha cluster of 4 machines and on Microgrid with CPU fraction 4% The periodic execution times obtained every 1 second for alpha cluster and _? second(s) for MicroGrid Close match with root mean square percentage difference to be 3.08%