CSIS 4130 - Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Balance Point The basis for the argument against “putting all your (speedup)

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Instruction Level Parallelism and Superscalar Processors

S CALING T HE S PEEDUP OF MULTI - CORE CHIPS BASED ON A MDAHL S LAW A.V. Bogdanov, Kyaw Zaya DUBNA,

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Distributed Systems CS

EECS 318 CAD Computer Aided Design LECTURE 2: DSP Architectures Instructor: Francis G. Wolff Case Western Reserve University This presentation.

CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.

1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.

Extending the Unified Parallel Processing Speedup Model Computer architectures take advantage of low-level parallelism: multiple pipelines The next generations.

The Optimum Pipeline Depth for a Microprocessor Fang Pang Oct/01/02.

CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.

Parallel System Performance CS 524 – High-Performance Computing.

Pipelining to Superscalar Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.

Chapter 2 Client Server Architecture

Chapter One Introduction to Pipelined Processors.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2001, 2004, 2005, 2006, 2008, Dr. Ken Hoganson CS8625-June-2-08 Class Will Start Momentarily…

Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.

18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.

N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Overview IS 8040 Data Communications Dr. Hoganson Course Overview Sending signals over a wire –Data: bits – binary (0/1) –How to transmit the digital data:

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

1 Superscalar Pipelines 11/24/08. 2 Scalar Pipelines A single k stage pipeline capable of executing at most one instruction per clock cycle. All instructions,

How computer’s are linked together.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework.

Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.

Lecture 8: 9/19/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.

Classic Model of Parallel Processing

Data Management for Decision Support Session-4 Prof. Bharat Bhasker.

E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.

Pipelining and Parallelism Mark Staveley

MW Tech IS 8040 Data Comm and Networking Dr. Hoganson Middleware Technology Communication Mechanisms Synchronous – process on client side must stop and.

NETWORK LOAD BALANCING (NLB) Microsoft Windows Server 2003 By Mohammad Alsawwaf ITEC452 Supervised By: Dr. Lee RADFORD UNIVERSITY.

Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.

Advanced Computer Networks Lecture 1 - Parallelization 1.

Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.

Middleware IS 8030 – Integrated Computing Environments Dr. Hoganson Middleware What is middleware? A software interface glue that resides between the operating.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

Parallel IO for Cluster Computing Tran, Van Hoai.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

30-Sep-16COMP28112 Lecture 21 A few words about parallel computing Challenges/goals with distributed systems: Heterogeneity Openness Security Scalability.

Performance. Moore's Law Moore's Law Related Curves.

CS203 – Advanced Computer Architecture

A few words about parallel computing

Lecture 2: Performance Evaluation

18-447: Computer Architecture Lecture 30B: Multiprocessors

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Parallel Processing - introduction

Multi-Processing in High Performance Computer Architecture:

Distributed System Structures 16: Distributed Structures

CSCE 432/832 High Performance Processor Architectures Scalar to Superscalar Adopted from Lecture notes based in part on slides created by Mikko H. Lipasti,

Architectural Interactions in High Performance Clusters

Distributed Systems CS

Distributed Systems CS

A few words about parallel computing

Chapter 4 Multiprocessors

A few words about parallel computing

Figure 7-1: Non-Pipelined Instruction Execution vs. 2-stage Pipeline

Presentation transcript:

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Balance Point The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law Note the balance point in the denominator where both parts are equal. Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Parallel Speedup Summary Level 1: Pipeline

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 2: Superscalar – Multiple Pipelines S = number of stages n = number of instructions M = number of pipelines s = frequency of pipeline stalls f = probability that an instruction causes a pipeline flush P = Degree of Multi-pipelining (number of concurrent pipes working) Pr = fraction of total work that runs on P pipelines Unified Speedup Model

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 3: Algorithm Parallelism N = number of processors in the architecture Alpha = fraction of the process that can be distributed across multiple processors PA = Probability of Acceptance of requests (by the interconnection network) Unified Speedup Model

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 3b: Scaled Algorithm Parallelism N = number of processors in the architecture Alpha = fraction of the process that can be distributed across multiple processors PA = Probability of Acceptance of requests (by the interconnection network) k P = Scaling factor on parallel work k S = Scaling factor on serial work Unified Speedup Model

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 4: Multi-Process or Clustered Speedup N = number of processors in the architecture C = number of processors in a cluster Alpha = fraction of the process that can be distributed across multiple processors PA = Probability of Acceptance of requests (by the inter-cluster I.N.) k P = Scaling factor on parallel work k S = Scaling factor on serial work Unified Speedup Model

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 4b: Scaled Multi-Process or Clustered Speedup N = number of processors in the architecture C = number of processors in a cluster Alpha = fraction of the process that can be distributed across multiple processors PA = Probability of Acceptance of requests (by the inter-cluster I.N.) k P = Scaling factor on parallel work k S = Scaling factor on serial work k 2 = Workload scaling factor Unified Speedup Model

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 5: N-tiered Client-Server Distributed Parallel System Unified Speedup Model M 1 = number of Tier 1 machines (clients) m 2 = number of Tier 2 machines (server 2 )  1 = workload balance,% of workload on Tier 1 (client)  2 = % of workload on Tier 2 (server 2 ) Sup 1 = Speedup of Tier 1 machine (Levels 1 – 4)

CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Level 5b: Scaled N-tiered Client-Server Distributed Parallel System Unified Speedup Model M i = number of Tier i machines  i = % of workload on Tier i Sup i = Speedup of Tier i machine (Parallel Levels 1 – 4) k C/S = Client/Server scaling factor Al i = Average Latency at Tier i PA i (k C/S )= Probability of Acceptance at Tier i (A function of k C/S )