Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol. 17, No. 3, pp. 263-275, 2006.

Slides:



Advertisements
Similar presentations
Chapter 15: Transactions Transaction Concept Transaction Concept Concurrent Executions Concurrent Executions Serializability Serializability Testing for.
Advertisements

A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Distributed Systems CS
SE-292 High Performance Computing
Courseware Scheduling of Distributed Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Martha Garcia.  Goals of Static Process Scheduling  Types of Static Process Scheduling  Future Research  References.
History of Distributed Systems Joseph Cordina
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
Reference: Message Passing Fundamentals.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Distributed Computations
Parallel Computing Overview CS 524 – High-Performance Computing.
Multiprocessors Andreas Klappenecker CPSC321 Computer Architecture.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Scheduling Parallel Task
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
PARUS: a parallel programming framework for heterogeneous multiprocessor systems Alexey N. Salnikov (salnikov cs.msu.su) Moscow State University Faculty.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Static Process Schedule Csc8320 Chapter 5.2 Yunmei Lu
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
MIMD Distributed Memory Architectures message-passing multicomputers.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Example: Sorting on Distributed Computing Environment Apr 20,
 A System Performance Model  Static Process Scheduling  Dynamic Load Sharing and Balancing  Real-Time Scheduling.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Contention-aware scheduling with task duplication J. Parallel Distrib. Comput. (2011) Oliver Sinnen ∗, Andrea To, Manpreet Kaur Tai, Yu-Chang 11/23/2012.
-Network topology is the layout of the connection between the computers. -It is also known as the pattern in which computers.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Structures and Algorithms in Parallel Computing Lecture 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Static Process Scheduling
Data Structures and Algorithms in Parallel Computing
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Super computers Parallel Processing
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
Parallel Computing Presented by Justin Reschke
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Distributed and Parallel Processing
COMPUTATIONAL MODELS.
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming in C with MPI and OpenMP
Data Structures and Algorithms in Parallel Computing
CS 584.
Professor Ioana Banicescu CSE 8843
Optimizing MPI collectives for SMP clusters
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol. 17, No. 3, pp , 2006.

Parallel processing is the oldest discipline in computer science – yet the general problem is far from solved

Why is parallel processing difficult? ”Jo flere kokker jo mere søl” –Partitioning and transforming problems –Load balancing –Inter-processor communication –Granularity –Architecture

Implementing parallel systems Manually –MPI –PVM –Linda Automatically –Parallelising compilers (Fortran) –Static scheduling

Taskgraph scheduling: Representing static computations

Modelling computations A=B+C Data dependencies A BC Valid sequences: CBA, BCA Invalid sequences: ABC, ACB, CAB, BAC

Another example A = (B-C)/D F = B+G A BCD F G

Scheduling

Static taskgraph scheduling techniques The scheduling process A BC DE A B D C E Taskgraph Allocation Schedule p1p2 time c1 c2 c3 c4 c5

Topological sorting –to order the vertices of a graph such that the precedence constraints are not violated All valid schedules represent a topological sort Scheduling algorithms differ in how they topologically sort the graph

The importance of abstraction Abstraction is important to preserve generality Too specific float sum = 0; for (int i=0;i<8;i++) { sum += a[i]; } General and flexible float sum = sumArray(a);

Communication

Communication is a major bottleneck Typically from 1:50 to 1:10,000 difference between computation and communication Communication cost not very dependent on data size. Interconnection network topology affect the overall time.

Scheduling work prior to 1995 Assumptions –Zero-interprocessor communication costs –Fully connected processor interconnection networks.

Amounts of data transfer Public transport is a good thing?

Data-size not is not major factor Multiple single messages Single compound message connectsendconnectsendconnectsend connectsend

Interconnection topology

Fully connected

The ring To send something from here.. …to here

Interprocessor communication Zero vs non-zero communication overheads Direct links vs connecting nodes P1P4P3P2 Bus P11P12P13P14 P21P22P23P24 P31P32P33P34 P41P42P43P44 Shared memory Bus-based multiprocessor Distributed memory Mesh multiprocessor RAM

Avoiding communication overheads

Duplication a bc aa bc a b c a b a c p1p2p1p2 t=1 t=2 t=3 t=1 t= duplication allocation

When considering communication overheads

Classic communication model: Assumptions Local communications have zero communication costs Communication is conducted by subsystem. Communication can be performed concurrently The network is fully connected

Implications Network contention (not modelled) –Tasks compete for communication resources Contention can be modelled: –Different types of edges –Switch verticies (in addition to processor verticies)

Processor involvement in communication I Two-sided involvement (TCP/IP PC-cluster)

Processor involvement in communication II One-sided involvement (Shared memory Cray T3E)

Processor involvement in communication III Third party involvement (Dedicated DMA hardware Meiko CS-2)

Problems All classic scheduling models assume third-party involvement. Very little hardware are equipped with dedicated hardware supporting third-party involvement. Estimated finish-times for tasks are hugely inaccurate. Scheduling algorithm are very sub- optimal.

Even more problems

Results bobcatSun E3500 3TE-900

The End