Download presentation
1
Prince Sultan College For Woman
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization & Assembly Language Lecture 4 (Computer System Organization) Processors - Parallelism
2
Outline From text Book: Chapter 2 ( 2.1.3, 2.1.4, 2.1.5, 2.1.6)
CISC vs. RISC Design Principles for modern Computers Instruction level Parallelism Processor level Parallelism Processors
3
RISC vs. CISC The control determines the instruction set of the computer and it is mainly split into two main categories: RISC: (Reduced Instruction Set Computer) CISC: (Complex Instruction set Computer) Processors
4
RISC vs. CISC (Cont.) The RISC computers consists of:
A small number of simple instructions that executes in one cycle of the data path. All the instructions are executed by hardware It s instructions are 10 times faster than the CISC instructions RISC machines had performance advantages: All instructions were supported by hardware The chip was properly designed without any backward compatibility issues. Processors
5
RISC vs. CISC (Cont.) The CISC computers consists of:
A large number of complex instructions. All the instructions will require interpretation. Complex instruction will be interpreted into many machine instructions and then executed by the computer hardware. It s instructions are 10 times slower than the RISC instructions The chip is designed keeping in mind backward compatibility issues. Both RISC and CISC instruction computers had their fan clubs and none of them was able to overcome the other in the market Processors
6
RISC vs. CISC (COMP.) Intel combined the RISC and CISC architectures (Intel 486) IT has a RISC core that executes simple instructions in a single cycle The more complex instructions are executed in a CISC way The net result: Common instructions are fast. Less common instructions are slow. It is not as fast as the pure RISC design It gives competitive overall performance while still allowing old software to run unmodified. Processors
7
Design Principles for Modern Computers
Modern computer design is based on a set of design principles, sometimes called the RISC design principles. These principles could be summarized in 5 major points All instructions are directly executed by hardware Maximize the rate at which the instructions are issued Instructions should be easy to decode Only loads and stores should access memory Provide plenty of registers Processors
8
Design Principles for Modern Computers
ALL Instructions are directly executed by hardware It eliminates a level of interpretation Provides high speed for most instructions For computers with CISC instruction implementation: Complex instructions can be split into smaller ones that could be executed as micro instructions This extra step slows the machine, but only for less frequently used instructions which is acceptable Processors
9
Design Principles for Modern Computers
Maximize the rate at which the instructions are issued MIPS = Millions of instructions per second MIPS speed related to the number of instructions issued per second, no matter how long the instructions actually take to complete. This principle suggests that parallelism can play a major role in improving performance Although instructions are always encountered in program order, they are not always issued in program order and they need not finish in program order. If instruction 1 sets a register and instruction 2 uses that register, great care must be taken to make sure that instruction 2 does not read that register until it contains the correct value. Getting this right requires a lot of bookkeeping but has the potential for performance gains by executing multiple instructions at once. Processors
10
Design Principles for Modern Computers
Instructions should be easy to decode Making instructions regular, fixed length, with a small number of fields. The fewer different formats for instructions, the better. Processors
11
Design Principles for Modern Computers
Only Loads and stores should access memory Operands for most instructions come from - and return to - registers. Access to memory can take a long time. Thus, Only LOAD and STORE instruction should reference memory. Processors
12
Design Principles for Modern Computers
Provide plenty of registers Accessing memory is relatively slow Many registers need to be provided (at least 32) Once a word is fetched it can be kept in register memory until no longer needed Processors
13
Parallelism There are two types of parallelism
Instruction level parallelism : instructions per second issued by the computer rather than improving the execution speed of a particular instruction. Processor level parallelism: multiple processors (CPUs) working together on the same problem. Processors
14
Instruction level Parallelism
Pipelining: The biggest bottleneck in the instruction cycle is fetching the instruction from memory. Prefetch instructions and put them in a Prefetch Buffer Instructions then are pre loaded into registers when it’s time for their execution Thus prefetching divides instruction execution up into two parts: Fetching. Actual Execution. Usually the execution is split into several stages with each stage having its own dedicated hardware piece, and all them can work in parallel Processors
15
Instruction Level Parallelism
A five stage Pipeline S1 S2 S3 S4 S5 Instruction fetch unit Instruction decode unit Operand fetch unit Instruction execution unit Write back unit S1: fetches instruction from memory and places it in a buffer until it is needed. S2: decodes the instruction, determining its type and what operands it needs. S3: locate and fetches the operands, either from registers or from memory. S4: actually does the work of carrying out the instruction, typically by running the operands through the data path. S5: Writes the result back to the proper register. Processors
16
Instruction Level Parallelism
Five stage pipeline The execution at each point in time S1 S2 S3 S4 S5 Instruction fetch unit Instruction decode unit Operand fetch unit Instruction execution unit Write back unit S1: S2: S3: S4: S5: Processors
17
Instruction level Parallelism
Duel Pipeline: Instruction fetch unit decode Operand execution Write back S1 S2 S3 S4 S5 Processors
18
Instruction Level Parallelism
Duel Pipeline Single instruction fetch unit fetches pairs of instructions together and puts each one into its own pipeline, completes with its own ALU for parallel operations. To be able to run in parallel, the two instructions must not conflict over resource usage (e.g. Registers), and neither must depend on the result of the other. Processors
19
Instruction Level Parallelism
Super scalar architecture Single pipeline with multiple functional units Instruction fetch unit decode Operand LOAD Write back S1 S2 S3 S4 S5 STORE Floating point ALU Processors
20
Processor Level Parallelism
Instruction level parallelism helps performance but only to a factor of 5 to 10 Processor parallelism gains a factor of 50, 100, and even more There are three main forms of processor parallelism Array computers: Array processors Vector processors Multiprocessors Multicomputer Processors
21
Processor Level Parallelism
Array computers Many problems in the physical sciences & engineering involve arrays Often the same calculations are performed on many different sets of data at the same time. The regularity & structure of these programs makes them especially easy targets for speedup through parallel execution. There are two methods that have been used to execute large scientific programs quickly: Array processor Vector processor Processors
22
Processor Level Parallelism
Array computers – Array processors consists of a large number of identical processors With single control unit to control all perform the same sequence of instructions on different sets of data (in parallel). Processors
23
Processor Level Parallelism
Array computers – Array processors 8 * 8 Processor/memory grid Processor (ALU + registers) Memory (local) Control Unit Broadcasts instructions Processors
24
Processor Level Parallelism
Array computers – Array processors Example: the vector addition C = A + B The control unit stores the ith components ai and bi of A and B in local memory mi The control unit broadcasts the add instruction ci = ai + bi to all processors. Addition takes place simultaneously, since there is an adder for each element in the vector. Processors
25
Processor Level Parallelism
Array computers – Vector processors appears to the programmer very much like an array processor. all of the addition operations in a vector processor are performed in a single, heavily- pipelined adder. (array processor has an adder for each element in the vector) the concept of a vector register, which consists of a set of conventional registers that can be loaded from memory in a single instruction, which actually loads them from memory serially. Vector addition instruction is performed on vector registers through a pipelined adder Processors
26
Processor Level Parallelism
Array computers – Vector processors Processors
27
Processor Level Parallelism
Array computers Both array processors and vector processors work on array of data. Both execute single instructions that, for example, add the element together pairwise for two vectors. The difference is on the way they perform the addition operation. In comparison with vector processors, array processors : can perform some data operations more efficiently requires more hardware is more difficult to program Processors
28
Processor Level Parallelism
Array computers Array processors are still being made, but they occupy an ever- decreasing niche market, since they only work well on problems requiring the same computation to be performed on many data sets simultaneously. Vector processor can be added to a conventional processor. The result is that parts of the program that can be vectorized can be executed quickly by taking advantage of the vector unit. While the rest of the program can be executed on a conventional single processor. Processors
29
Processor Level Parallelism
Multiprocessors A multiprocessor is made up of a collection of CPUs sharing a common memory. There are various implementation schemas for the CPU to access memory The simplest one is to have a single bus with multiple CPUs and one memory all plugged into it. Shared Memory CPU Bus Processors
30
Processor Level Parallelism
Multiprocessors The bus quickly becomes a bottleneck in scheme 1. Another solution is to give each CPU a local memory, it can cache information there. Shared Memory CPU Bus Local memories Processors
31
Processor Level Parallelism
Multiprocessors multiprocessors with a small number of processor (<=64) are relatively easy to build. large ones are difficult to construct The difficulty lays in connecting all the processors to the memory Processors
32
Processor Level Parallelism
Multicomputer similar to a multiprocessor in that it is made up of a collection of CPUs it differs in that there is no shared memory. Individual CPUs communicate by sending messages, something like , but much faster. Multicomputers with nearly 10,000 CPUs have been built and put into operation. Processors
33
Processor Level Parallelism
Conclusion b/w multiprocessors & multicomputers multiprocessors are easier to program multicomputers are easier to build there is much research on designing hybrid systems that combine the good properties of each Processors
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.