Divisible Load Scheduling A Tutorial Thomas Robertazzi University at Stony Brook.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

anywhere and everywhere. omnipresent A sensor network is an infrastructure comprised of sensing (measuring), computing, and communication elements.

Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Advanced Data Structures

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.

Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.

Parallel System Performance CS 524 – High-Performance Computing.

History of Distributed Systems Joseph Cordina

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.

Topic Overview One-to-All Broadcast and All-to-One Reduction

1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.

Multihop wireless networks Geographical Routing Karp, B. and Kung, H.T., Greedy Perimeter Stateless Routing for Wireless Networks, in MobiCom Using.

1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.

Chapter 3: Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication in Client-Server.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

LOCAL AREA NETWORK A local area network (lan) is a communication network that interconnects a variety of data communicating devices within a small geographic.

Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.

Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.

DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.

Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

5 SECTION A 1 Network Building Blocks  Network Classifications  LAN Standards  Network Devices  Clients, Servers, and Peers  Physical Topology  Network.

COMPUTER NETWORKING.  Definition  Need & advantages  Types of network  Basics of network architecture  LAN Topologies  Network models  Network.

Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.

O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.

MMAC: A Mobility- Adaptive, Collision-Free MAC Protocol for Wireless Sensor Networks Muneeb Ali, Tashfeen Suleman, and Zartash Afzal Uzmi IEEE Performance,

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

Static Process Scheduling

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.

CPS 258, Fall 2004 Introduction to Computational Science.

Distributed Network Coding Based Opportunistic Routing for Multicast Abdallah Khreishah, Issa Khalil, and Jie Wu.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Indian Institute of Technology Bombay 1 Communication Networks Prof. D. Manjunath

Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

Interconnection Networks Communications Among Processors.

Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)

Network Processing Systems Design

Grid Computing.

Resource Allocation in Non-fading and Fading Multiple Access Channel

ME 521 Computer Aided Design 15-Optimization

Lecture 2: Processes Part 1

Hidden Markov Models Part 2: Algorithms

Chapter 01: Introduction

Presentation transcript:

Divisible Load Scheduling A Tutorial Thomas Robertazzi University at Stony Brook

What is a Divisible Load? b A computational & networkable load that is arbitrarily partitionable (divisible) amongst processors and links. b There are no precedence relations.

Simple Application Example b Problem: Sum 1000 trillion numbers b Approach: Partition the numbers among 100 processors b But how?

Simple Application Example b To optimize solution time (maximize speedup) one needs to take into account heterogeneous link and processor speeds, computation and communication intensities, interconnection topology and scheduling policy. b Divisible Load Scheduling Theory Can Do This!

Applications (Generic) b Grid Computing/Meta-computing b Data Intensive Computing b Sensor Processing b Image Processing b Scientific/Engineering Computing b Financial Computing

Applications (Specific) b Pattern Searching b Database Computation b Matrix-Vector Computation b E&M Field Calculation (CAD) b Edge Detection

DLT Modeling Advantages b Linear and Deterministic Modeling b Tractable Recursive/Linear Equation Solution b Schematic Language b Equivalent Elements b Many Applications

Interconnection Topologies b Linear Daisy Chain b Bus b Single Level and Multilevel Trees b Mesh b Hypercube

Directions: Scalability Sequential Distribution (Saturation) Simultaneous Distribution (Scalable) Hung & RobertazziCheng & Robertazzi

An Example b Model Specifications: A star network( single level tree network), and multi-level tree.A star network( single level tree network), and multi-level tree. Computation and transmission time is a linear function of the size of load.Computation and transmission time is a linear function of the size of load. Level to Level: Store and Forward SwitchingLevel to Level: Store and Forward Switching Same Level: Concurrent Load Distribution.Same Level: Concurrent Load Distribution.

b Children without Front End: b After receiving the assigned data, each child proceeds to process the data.

b Timing Diagram (single level tree) : b Children without Front End

m+1 unknows vs. m+1 Eqs. b Recursive equations: b Normalization equation:

b Distribution Solution:

b The load distribution solution is similar to the solution of the state-dependent M/M/1 queuing system.

Similarities to Queueing Theory b Linear model and tractable solutions b Schematic Language b Equivalent Elelements b Infinite Size Networks

b Speedup Analysis

b Speedup Analysis (continued)

b Tree Network b (Children without Front Ends)

Collapsing single level trees

Bandwidth of Fat Tree b Definition: The bandwidth of level j in a fat tree can be defined as p j-1 z.

Directions: Sequencing and Installments b Daisy Chain Surprise b Efficiency Rule Ghose, Mani & Bharadwaj

Directions: Sequencing and Installments b Multi-installment for Sequential Distribution Ghose, Mani & Bharadwaj

Directions: Sequencing and Installments Diminishing returns in using multi-installment distribution. Ghose, Mani & Bharadwaj

Directions: Sequencing and Installments Drozdowski

Directions: Time Varying Modeling Can be solved with integral calculus. Sohn & Robertazzi

Directions: Monetary Cost Optimization  Min C Total  n c n w n T cp n=1 N Bus Processors Optimal Sequential Distribution if: c n-1 w n-1 less than c n w n for all n Sohn, Luryi & Robertazzi

Directions: Monetary Cost Optimization b 2 US Patents: Patent 5,889,989 (1999): Processor Cost Patent 5,889,989 (1999): Processor Cost Patent 6,370,560 (2001): Processor and Patent 6,370,560 (2001): Processor and Link Cost Link Cost Enabling technology for an open e-commerce market in leased proprietary computing. Sohn, Charcranoon, Luryi & Robertazzi

Directions: Database Modeling Expected time to find multiple records in flat file database Ko & Robertazzi

Directions: Realism Finite Buffers (Bharadwaj) Finite Buffers (Bharadwaj) Job Granularity (Bharadwaj) Job Granularity (Bharadwaj) Queueing Model Integration Queueing Model Integration

Directions: Experimental Work Database Join (Drozdowski)

Directions: Future Research b Operating Systems: Incorporate divisible load scheduling Incorporate divisible load scheduling into (distributed) operating systems into (distributed) operating systems b Measurement Process Modeling: Integrate measurement process Integrate measurement process modeling into divisible scheduling modeling into divisible scheduling

Directions: Future Research b Pipelining (Dutot) Concept: Distribute load to Concept: Distribute load to further processors first for further processors first for speedup improvement speedup improvement Improvement reported for daisy chains

Directions: Future Research b System Parameter Estimation (Ghose): Concept: Send small “probing” loads across links and to processors to estimate links and to processors to estimate available effort available effort Challenge: Rapid change in link & processor state state

DLT has a Good Future b Many Applications including wireless sensor networks wireless sensor networks b Tractable (Modeling & Computation) b Rich Theoretical Basis

Thank you! Questions??? Questions???