Prince Sultan College For Woman

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
Fall 2012SYSC 5704: Elements of Computer Systems 1 MicroArchitecture Murdocca, Chapter 5 (selected parts) How to read Chapter 5.
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
PZ13A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13A - Processor design Programming Language Design.
Processor Technology and Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 12 CPU Structure and Function. Example Register Organizations.
GCSE Computing - The CPU
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
Organization of a Simple Computer. Computer Systems Organization  The CPU (Central Processing Unit) is the “brain” of the computer. Fetches instructions.
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
CPU Describe the purpose of the CPU
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
Processor Structure & Operations of an Accumulator Machine
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Parallelism Processing more than one instruction at a time. Pipelining
Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved A five-level memory.
Computing hardware CPU.
Basics and Architectures
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Previously Fetch execute cycle Pipelining and others forms of parallelism Basic architecture This week we going to consider further some of the principles.
IT253: Computer Organization Lecture 10: Making a Processor: Control Signals Tonga Institute of Higher Education.
What have mr aldred’s dirty clothes got to do with the cpu
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Operating Systems Lecture No. 2. Basic Elements  At a top level, a computer consists of a processor, memory and I/ O Components.  These components are.
Organization of a Simple Computer The organization of a simple computer with one CPU and two I/O devices.
RISC Architecture RISC vs CISC Sherwin Chan.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CS Computer Architecture Section 600 Dr. Angela Guercio Fall 2010.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
Computer performance issues* Pipelines, Parallelism. Process and Threads.
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
M211 – Central Processing Unit
Copyright © 2005 – Curt Hill MicroProgramming Programming at a different level.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
Advanced Architectures
Central Processing Unit Architecture
A Closer Look at Instruction Set Architectures
Parallel Processing - introduction
3.3.3 Computer architectures
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Computer Architecture
Central Processing Unit
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Lecture 4: Instruction Set Design/Pipelining
Presentation transcript:

Prince Sultan College For Woman Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization & Assembly Language Lecture 4 (Computer System Organization) Processors - Parallelism

Outline From text Book: Chapter 2 ( 2.1.3, 2.1.4, 2.1.5, 2.1.6) CISC vs. RISC Design Principles for modern Computers Instruction level Parallelism Processor level Parallelism Processors

RISC vs. CISC The control determines the instruction set of the computer and it is mainly split into two main categories: RISC: (Reduced Instruction Set Computer) CISC: (Complex Instruction set Computer) Processors

RISC vs. CISC (Cont.) The RISC computers consists of: A small number of simple instructions that executes in one cycle of the data path. All the instructions are executed by hardware It s instructions are 10 times faster than the CISC instructions RISC machines had performance advantages: All instructions were supported by hardware The chip was properly designed without any backward compatibility issues. Processors

RISC vs. CISC (Cont.) The CISC computers consists of: A large number of complex instructions. All the instructions will require interpretation. Complex instruction will be interpreted into many machine instructions and then executed by the computer hardware. It s instructions are 10 times slower than the RISC instructions The chip is designed keeping in mind backward compatibility issues. Both RISC and CISC instruction computers had their fan clubs and none of them was able to overcome the other in the market  Processors

RISC vs. CISC (COMP.) Intel combined the RISC and CISC architectures (Intel 486) IT has a RISC core that executes simple instructions in a single cycle The more complex instructions are executed in a CISC way The net result: Common instructions are fast. Less common instructions are slow. It is not as fast as the pure RISC design It gives competitive overall performance while still allowing old software to run unmodified. Processors

Design Principles for Modern Computers Modern computer design is based on a set of design principles, sometimes called the RISC design principles. These principles could be summarized in 5 major points All instructions are directly executed by hardware Maximize the rate at which the instructions are issued Instructions should be easy to decode Only loads and stores should access memory Provide plenty of registers Processors

Design Principles for Modern Computers ALL Instructions are directly executed by hardware It eliminates a level of interpretation Provides high speed for most instructions For computers with CISC instruction implementation: Complex instructions can be split into smaller ones that could be executed as micro instructions This extra step slows the machine, but only for less frequently used instructions which is acceptable Processors

Design Principles for Modern Computers Maximize the rate at which the instructions are issued MIPS = Millions of instructions per second MIPS speed related to the number of instructions issued per second, no matter how long the instructions actually take to complete. This principle suggests that parallelism can play a major role in improving performance Although instructions are always encountered in program order, they are not always issued in program order and they need not finish in program order. If instruction 1 sets a register and instruction 2 uses that register, great care must be taken to make sure that instruction 2 does not read that register until it contains the correct value. Getting this right requires a lot of bookkeeping but has the potential for performance gains by executing multiple instructions at once. Processors

Design Principles for Modern Computers Instructions should be easy to decode Making instructions regular, fixed length, with a small number of fields. The fewer different formats for instructions, the better. Processors

Design Principles for Modern Computers Only Loads and stores should access memory Operands for most instructions come from - and return to - registers. Access to memory can take a long time. Thus, Only LOAD and STORE instruction should reference memory. Processors

Design Principles for Modern Computers Provide plenty of registers Accessing memory is relatively slow Many registers need to be provided (at least 32) Once a word is fetched it can be kept in register memory until no longer needed Processors

Parallelism There are two types of parallelism Instruction level parallelism : instructions per second issued by the computer rather than improving the execution speed of a particular instruction. Processor level parallelism: multiple processors (CPUs) working together on the same problem. Processors

Instruction level Parallelism Pipelining: The biggest bottleneck in the instruction cycle is fetching the instruction from memory. Prefetch instructions and put them in a Prefetch Buffer Instructions then are pre loaded into registers when it’s time for their execution Thus prefetching divides instruction execution up into two parts: Fetching. Actual Execution. Usually the execution is split into several stages with each stage having its own dedicated hardware piece, and all them can work in parallel Processors

Instruction Level Parallelism A five stage Pipeline S1 S2 S3 S4 S5 Instruction fetch unit Instruction decode unit Operand fetch unit Instruction execution unit Write back unit S1: fetches instruction from memory and places it in a buffer until it is needed. S2: decodes the instruction, determining its type and what operands it needs. S3: locate and fetches the operands, either from registers or from memory. S4: actually does the work of carrying out the instruction, typically by running the operands through the data path. S5: Writes the result back to the proper register. Processors

Instruction Level Parallelism Five stage pipeline The execution at each point in time S1 S2 S3 S4 S5 Instruction fetch unit Instruction decode unit Operand fetch unit Instruction execution unit Write back unit S1: 1 2 3 4 5 6 7 8 9 S2: 1 2 3 4 5 6 7 8 S3: 1 2 3 4 5 6 7 S4: 1 2 3 4 5 6 S5: 1 2 3 4 5 1 2 3 4 5 6 7 8 9 Processors

Instruction level Parallelism Duel Pipeline: Instruction fetch unit decode Operand execution Write back S1 S2 S3 S4 S5 Processors

Instruction Level Parallelism Duel Pipeline Single instruction fetch unit fetches pairs of instructions together and puts each one into its own pipeline, completes with its own ALU for parallel operations. To be able to run in parallel, the two instructions must not conflict over resource usage (e.g. Registers), and neither must depend on the result of the other. Processors

Instruction Level Parallelism Super scalar architecture Single pipeline with multiple functional units Instruction fetch unit decode Operand LOAD Write back S1 S2 S3 S4 S5 STORE Floating point ALU Processors

Processor Level Parallelism Instruction level parallelism helps performance but only to a factor of 5 to 10 Processor parallelism gains a factor of 50, 100, and even more There are three main forms of processor parallelism Array computers: Array processors Vector processors Multiprocessors Multicomputer Processors

Processor Level Parallelism Array computers Many problems in the physical sciences & engineering involve arrays Often the same calculations are performed on many different sets of data at the same time. The regularity & structure of these programs makes them especially easy targets for speedup through parallel execution. There are two methods that have been used to execute large scientific programs quickly: Array processor Vector processor Processors

Processor Level Parallelism Array computers – Array processors consists of a large number of identical processors With single control unit to control all perform the same sequence of instructions on different sets of data (in parallel). Processors

Processor Level Parallelism Array computers – Array processors 8 * 8 Processor/memory grid Processor (ALU + registers) Memory (local) Control Unit Broadcasts instructions Processors

Processor Level Parallelism Array computers – Array processors Example: the vector addition C = A + B The control unit stores the ith components ai and bi of A and B in local memory mi The control unit broadcasts the add instruction ci = ai + bi to all processors. Addition takes place simultaneously, since there is an adder for each element in the vector. Processors

Processor Level Parallelism Array computers – Vector processors appears to the programmer very much like an array processor. all of the addition operations in a vector processor are performed in a single, heavily- pipelined adder. (array processor has an adder for each element in the vector) the concept of a vector register, which consists of a set of conventional registers that can be loaded from memory in a single instruction, which actually loads them from memory serially. Vector addition instruction is performed on vector registers through a pipelined adder Processors

Processor Level Parallelism Array computers – Vector processors Processors

Processor Level Parallelism Array computers Both array processors and vector processors work on array of data. Both execute single instructions that, for example, add the element together pairwise for two vectors. The difference is on the way they perform the addition operation. In comparison with vector processors, array processors : can perform some data operations more efficiently requires more hardware is more difficult to program Processors

Processor Level Parallelism Array computers Array processors are still being made, but they occupy an ever- decreasing niche market, since they only work well on problems requiring the same computation to be performed on many data sets simultaneously. Vector processor can be added to a conventional processor. The result is that parts of the program that can be vectorized can be executed quickly by taking advantage of the vector unit. While the rest of the program can be executed on a conventional single processor. Processors

Processor Level Parallelism Multiprocessors A multiprocessor is made up of a collection of CPUs sharing a common memory. There are various implementation schemas for the CPU to access memory The simplest one is to have a single bus with multiple CPUs and one memory all plugged into it. Shared Memory CPU Bus Processors

Processor Level Parallelism Multiprocessors The bus quickly becomes a bottleneck in scheme 1. Another solution is to give each CPU a local memory, it can cache information there. Shared Memory CPU Bus Local memories Processors

Processor Level Parallelism Multiprocessors multiprocessors with a small number of processor (<=64) are relatively easy to build. large ones are difficult to construct The difficulty lays in connecting all the processors to the memory Processors

Processor Level Parallelism Multicomputer similar to a multiprocessor in that it is made up of a collection of CPUs it differs in that there is no shared memory. Individual CPUs communicate by sending messages, something like e-mail, but much faster. Multicomputers with nearly 10,000 CPUs have been built and put into operation. Processors

Processor Level Parallelism Conclusion b/w multiprocessors & multicomputers multiprocessors are easier to program multicomputers are easier to build there is much research on designing hybrid systems that combine the good properties of each Processors