Types of Concurrent Events

Slides:

Advertisements

Similar presentations

CSCI 4717/5717 Computer Architecture

Advertisements

Computer Organization Lab 1 Soufiane berouel. Formulas to Remember CPU Time = CPU Clock Cycles x Clock Cycle Time CPU Clock Cycles = Instruction Count.

Lecture 6: Multicore Systems

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Parallell Processing Systems1 Chapter 4 Vector Processors.

Computer Organization and Architecture 18 th March, 2008.

Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

1: Operating Systems Overview

Chapter 17 Parallel Processing.

Chapter 1 and 2 Computer System and Operating System Overview

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.

Lecture 3: Computer Performance

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.

1 Computer Performance: Metrics, Measurement, & Evaluation.

Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Lecture 2b: Performance Metrics. Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

Morgan Kaufmann Publishers

Pipelining and Parallelism Mark Staveley

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

Performance Performance

1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.

1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.

September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

EGRE 426 Computer Organization and Design Chapter 4.

Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.

PipeliningPipelining Computer Architecture (Fall 2006)

BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.

Measuring Performance II and Logic Design

Applied Operating System Concepts

Computer Architecture & Operations I

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

September 2 Performance Read 3.1 through 3.4 for Tuesday

Defining Performance Which airplane has the best performance?

Computer Architecture & Operations I

Parallel Processing - introduction

Performance of Single-cycle Design

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers

/ Computer Architecture and Design

Types of Concurrent Events

Computer Architecture

Operating Systems CPU Scheduling.

Instruction Level Parallelism and Superscalar Processors

Pipelining and Vector Processing

Instruction Level Parallelism and Superscalar Processors

Serial versus Pipelined Execution

CPU SCHEDULING.

Morgan Kaufmann Publishers Computer Performance

Chapter 12 Pipelining and RISC

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Operating System Overview

Instruction Level Parallelism

COMPUTER ORGANIZATION AND ARCHITECTURE

Chapter 13: I/O Systems.

Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.

Computer Organization and Design Chapter 4

William Stallings Computer Organization and Architecture

Presentation transcript:

Types of Concurrent Events There are 3 types of concurrent events:- Parallel Event or Synchronous Event :- (Type of concurrency is parallelism) It may occur in multiple resources during the same interval time. Example Array/Vector Processors CU PE Based on ALU

Multiprocessing System 2. Simultaneous Event or Asynchronous Event :- (Type of concurrency is simultaneity ) It may occur in multiple resources during the same interval time. Example Multiprocessing System 3. Pipelined Event or Overlapped Event :- It may occur in overlapped spans. Example Pipelined Processor

THE STATE OF COMPUTING System Performance Machine Capability and Program Behaviour Peak Performance Turnaround time Cycle Time, Clock Rate and Cycles Per Instruction (CPI) Performance Factors Instruction Count, Average CPI, Cycle Time, Memory Cycle Time and No. of memory cycles System Attributes Instruction Set Architecture, Compiler Technology, Processor Implementation and control, Cache and Memory Hierarchy MIPS Rate, FLOPS and Throughput Rate Programming Environments – Implicit and Explicit Parallelism

THE STATE OF COMPUTING System Attributes to Performance Cycle Time (processor) 𝜏 Clock Rate 𝑓=1/𝜏 Average no. of cycles per instruction 𝐶𝑃𝐼 No. of instructions in program 𝐼𝑐 CPU Time 𝑇=𝐼𝑐×𝐶𝑃𝐼×𝜏 Memory Cycle Time 𝑘𝜏 No. of Processor Cycles needed 𝑝 No. of Memory Cycles needed 𝑚 Effective CPU Time 𝑇=𝐼𝑐×(𝑝+𝑚×𝑘)×𝜏

THE STATE OF COMPUTING System Attributes to Performance MIPS Rate 𝜏= 𝐼𝑐 𝑇×106 = 𝑓 𝐶𝑃𝐼×106 = 𝑓×𝐼𝑐 𝐶×106 Throughput Rate 𝑊𝑝= 𝑀𝐼𝑃𝑆×106 𝐼𝑐 = 𝑓 𝐶𝑃𝐼×𝐼𝑐

System Attributes versus Performance Factors The ideal performance of a computer system requires a perfect match between machine capability and program behavior. Machine capability can be enhanced with better hardware technology, however program behavior is difficult to predict due to its dependence on application and run-time conditions. Below are the fundamental factors for projecting the performance of a computer. 1) Efficiency is measured as:- Efficiency = Time / Speed + Accuracy

2.Clock Rate :- CPU is driven by a clock of constant cycle time ().  = 1/ f (ns) 3. CPI :- (Cycles per instruction) As different instructions acquire different cycles to execute, CPI will be taken as an average value for a given instruction set and a given program mix.

CPI = Instruction Cycle = Processor Cycles + 3. Execution Time :- Let Ic be Instruction Count or total number of instructions in the program. So Execution Time = ? T = Ic  CPI   Now, CPI = Instruction Cycle = Processor Cycles + Memory Cycles  Instruction cycle = p + m  k where m = number of memory references

P = number of processor cycles k = latency factor (how much the memory is slow w.r.t to CPU) Now let C be Total number of cycles required to execute a program. So, C = ? C = Ic  CPI And the time to execute a program will be T = C  

4. MIPS Rate :- MIPS rate = 5. Throughput Rate:- Number of programs executed per unit time. W = ? Ic T  10 6 W = 1 / T OR W = MIPS  10 Ic 6

Numerical:- A benchmark program is executed on a 40MHz processor Numerical:- A benchmark program is executed on a 40MHz processor. The benchmark program has the following statistics. Instruction Type Arithmetic Branch Load/Store Floating Point Instruction Count 45000 32000 15000 8000 Clock Cycle Count 1 2 Calculate average CPI,MIPS rate & execution for the above benchmark program.

Average CPI = C Ic C = Total # cycles to execute a whole program Ic Total Instruction = 45000 1 + 320002 + 150002 + 80002 45000 + 32000 + 15000 + 8000 = 155000 100000 CPI = 1.55 Execution Time = C / f

T = 155000 / 40  10 T = 0.155 / 40 T = 3.875 ms MIPS rate = Ic / T  10 MIPS rate = 25.8 6 6

 System Attributes Performance Factors Ic p m k CPI Instruction-set Architecture Compiler Technology CPU Implementation & Technology Memory Hierarchy

Practice Problems :- A benchmark program containing 234,000 instructions is executed on a processor having a cycle time of 0.15ns. The statistics of the program is given below. Each memory reference requires 3 CPU cycles to complete. Calculate MIPS rate & throughput for the program. Instruction Type Arithmetic Branch Load/Store Instruction Mix 58 % 33 % 9 % Processor Cycles 2 3 Memory Cycles 2 1

Programmatic Levels of Parallel Processing Parallel Processing can be challenged in 4 programmatic levels:- Job / Program Level 2. Task / Procedure Level 3. Interinstruction Level 4. Intrainstruction Level

1. Job / Program Level :- It requires the development of parallel processable algorithms. The implementation of parallel algorithms depends on the efficient allocation of limited hardware and software resources to multiple programs being used to solve a large computational problem. Example: Weather forecasting , medical consulting , oil exploration etc.

2. Task / Procedure Level :- It is conducted among procedure/tasks within the same program. This involves the decomposition of the program into multiple tasks. ( for simultaneous execution ) 3. Interinstruction Level :- Interinstruction level is to exploit concurrency among multiple instructions so that they can be executed simultaneously. Data dependency analysis is often performed to reveal parallellism amoung instructions.

4. Intrainstruction Level :- Vectorization may be desired for scalar operations within DO loops. 4. Intrainstruction Level :- Intrainstruction level exploits faster and concurrent operations within each instruction e.g. use of carry look ahead and carry save address instead of ripple carry address.

Key Points :- Hardware role decrease from high to low levels whereas software role increases from low to high levels. As highest job level is conducted algorithmically, lowest level is implemented directly by hardware means. The trade-off between hardware and software approaches to solve a problem is always a very controversial issue.

4. As hardware cost declines and software cost increases , more and more hardware method are replacing the conventional software approaches. Conclusion :- Parallel Processing is a combined field of studies which requires a broad knowledge of and experience with all aspects of algorithms, languages, hardware, software, performance evaluation and computing alternatives.

Parallel Processing in Uniprocessor Systems A number of parallel processing mechanisms have been developed in uniprocessor computers. We identify them in six categories which are described below. 1. Multiplicity of Functional Units :- Different ALU functions can be distributed to multiple & specialized functional units which can operate in parallel.

2. Parallelism & Pipelining within the CPU :- Use of carry-lookahead & carry-save address instead of ripple-carry adders. Cascade two 4-bit parallel adders to create an 8-bit parallel adder.

Ripple-carry Adder :- At each stage the sum bit is not valid until after the carry bits in all the preceding stages are valid. No of bits is directly proportional to the time required for valid addition. Problem :- The time required to generate each carryout bit in the 8-bit parallel adder is 24ns. Once all inputs to an adder are valid, there is a delay of 32ns until the output sum bit is valid. What is the maximum number of additions per second that the adder can perform?

1 addition = 7  24 + 32 = 200ns Additions / sec = 1 / 200 = 0.5  10  10 = 5  10 = 5 million additions / sec -3 9 6

Practice Problem Assuming the 32ns delay in producing a valid sum bit in the 8-bit parallel adder. What maximum delay in generating a carry out bit is allowed if the adder must be capable of performing 10 additions per second. 7

3. Overlapping CPU & I/O Operations :- DMA is conducted on a cycle-stealing basis. CDC-6600 has 10 I/O processors of I/O multiprocessing. Simultaneous I/O operations & CPU computations can be achieved using separate I/O controllers, channels. 4. Use of hierarchical Memory System :- A hierarchal memory system can be used to close up the speed gap between the CPU &

main memory because CPU is 1000 times faster than memory access. 5. Balancing of Subsystem Bandwidth :- Consider the relation tm< tm < td Bandwidth of a System :- Bandwidth of a system is defined as the number of operations performed per unit time. Bandwidth of a memory :- The memory bandwidth is the number of words

Processor Bandwidth :- Bp :- maximum CPU computation rate. Bp :- utilized processor bandwidth or the no. of u

output results per second. Bp = Rw (word result) Tp u Rw :- no of word results. Tp :- Total CPU time to generate Rw results. Bd :- Bandwidth of devices. (which is assumed as provided by the vendor). The following relationships have been observed between the bandwidths of the major subsystems in a high performance uniprocessor. Bm  Bm  u Bp  Bp  u Bd

Due to the unbalanced speeds we need to match the processing power of the three subsystem, we need to match the processing power of the three subsystems. Two major approaches are described below :- Bandwidth balancing b/w CPU & memory :- Using fast cache having access time tc = tp. 2. Bandwidth balancing b/w memory & I/O :- Intelligent disk controllers can be used to filter out irrelevant data off the tracks. Buffering can be performed by I/O channels.

6a. Multiprogramming :- As we know that some computer programs are CPU bound & some are I/O bound.Whenever a Process P1 is tied up with I/O operations.The system scheduler can switch the CPU to process P2.This allows simultaneous execution of several programs in the system.This interleaving of CPU & I/O operations among several programs is called multiprogramming, so the total execution time is reduced.

6b. Time sharing :- In multiprogramming, sometimes a high priority program may occupy the CPU for too long to allow others to share. This problem can be overcome by using a time-sharing operating system.The concept extends from multiprogram –ming by assigning fixed or variable time slices to multiple programs. In other words, equal opportunities are given to all programs competing for the use of CPU. Time sharing is particularly effective when applied to a computer system connected to many interactive terminals.