Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.

Slides:

Advertisements

Similar presentations

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

Advertisements

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.

Fall 2012SYSC 5704: Elements of Computer Systems 1 MicroArchitecture Murdocca, Chapter 5 (selected parts) How to read Chapter 5.

RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.

Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.

Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.

CSCI 8150 Advanced Computer Architecture

Processor Technology and Architecture

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.

Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.

Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

PowerPC 601 Stephen Tam. To be tackled today Architecture Execution Units Fixed-Point (Integer) Unit Floating-Point Unit Branch Processing Unit Cache.

GCSE Computing - The CPU

(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.

Prince Sultan College For Woman

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

Advanced Computer Architectures

Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.

CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.

Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

Basics and Architectures

1 4.2 MARIE This is the MARIE architecture shown graphically.

Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Chapter Six Sun SPARC Architecture. SPARC Processor The name SPARC stands for Scalable Processor Architecture SPARC architecture follows the RISC design.

Chapter 16 Micro-programmed Control

CS5222 Advanced Computer Architecture Part 3: VLIW Architecture

Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

Introduction to Microprocessors

CPS 4150 Computer Organization Fall 2006 Ching-Song Don Wei.

Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.

ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.

Pipelining and Parallelism Mark Staveley

Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.

Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.

Fundamentals of Programming Languages-II

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.

CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.

PipeliningPipelining Computer Architecture (Fall 2006)

Computer Organization and Architecture Lecture 1 : Introduction

Topics to be covered Instruction Execution Characteristics

Advanced Architectures

Visit for more Learning Resources

Central Processing Unit Architecture

A Closer Look at Instruction Set Architectures

Superscalar Processors & VLIW Processors

Superscalar Pipelines Part 2

Microprocessor & Assembly Language

EE 445S Real-Time Digital Signal Processing Lab Spring 2014

Morgan Kaufmann Publishers Computer Organization and Assembly Language

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

* From AMD 1996 Publication #18522 Revision E

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Chapter 12 Pipelining and RISC

COMPUTER ORGANIZATION AND ARCHITECTURE

Presentation transcript:

Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors

Design Space of processors Various processor families can be mapped onto a coordinated space of clock rate versus cycles per instruction(CPI). As implementation technology evolves rapidly, the clock rates of various processors are gradually moving from low to higher speeds toward the right of the design space. Manufacturers are trying to lower the CPI rate using hardware and software approaches.

CISC processors: Intel i486,M68040,VAX/8600,IBM 390. Clock rate: 33 to 50 MHz. CPI:varies from 1 to 2 cycles. Microprogrammed control. RISC processors: Intel i860,SPARC,MIPS R3000,IBM RS/6000. Clock rate: 20 to 120 MHz. CPI:1 to 2 cycles. Hardwired control.

Super scalar processors: Intel i960CA,IBM RS/6000,DEC Multiple instructions are issued simultaneously during each cycle. Clock rate: 20 to 120 MHz. CPI:.2 to.5 cycles. Subclass of RISC processors

Very long instruction word(VLIW): Uses more functional units than superscalar processor. Clock rate: 5 to 50 MHz. CPI:.1 to.2 cycles. Uses very long instructions (256 to 1024 bits per instruction.) Implemented with micro programmed control.

Super pipelined processors: Uses multiphase clocks with a much increased clock rate. Clock rate: 100 to 500 MHz. CPI: 1 to 5 cycles. Vector supercomputers: Uses multiple functional units for concurrent scalar and vector operations. Processors are super pipelined. Very high clock rate:100 to 1000 MHz. Very low CPI:.1 to.2 cycles.

Instruction pipelines The execution cycle of a typical instruction includes four phases. Fetch Decode Execute Write-back Pipeline cycle: It is defined as the time required for each phase to complete its operation assuming equal delay in all phases.

Instruction pipeline cycle: the clock period of the instruction pipeline. Instruction issue latency: the time ( in cycles) required between the issuing of two adjacent instructions. Instruction issue rate: the number of instructions issued per cycle. Simple operation latency: simple operations are integer adds, loads, stores, etc. Complex operations are divides, cache misses. Resource Conflicts: two or more instructions demand use of the same functional unit at the same time.

Processors and Co-processors The central processor of a computer is called the CPU. The CPU is a scalar processor consist of multiple functional units such as integer arithmetic and logic unit, a floating point accelerator. Floating point unit can be built on a coprocessor which is attached to the CPU. Coprocessor executes instructions dispatched from the CPU.

Coprocessor may be a floating point accelerator executing scalar data, a vector processor executing vector operands, a digital signal processor, or a Lisp processor executing AI programs. Coprocessors cannot handle I/O operations. Coprocessors cannot be used alone. Processor and Coprocessor operate with a host- back-end relationship. Coprocessor may be more powerful than its host.

Digital Equipment VAX 8600 Introduced by Digital Equipment Corporation in This implements a typical CISC architecture with micro programmed control. Instruction set Contains about 300 instructions and 20 addressing modes. Consists of two functional units for concurrent execution of integer and floating point instructions.

Unified cache is use for holding both instructions and data. 16 GPRs in instruction unit. Translation lookaside buffer(TLB) is used in the memory unit for fast generation of physical address from virtual address. Both integer and floating point units are pipelined. CPI varies from 2 cycles to 20 cycles.

Intel i860 processor architecture Introduced by Intel Corporation in It is a 64 bit RISC processor fabricated on a single chip containing more than one million transistors. There are 9 functional units interconnected by multiple data paths with widths ranging from 32 to 128 bits. All external or internal address buses are 32 bit wide. All external or internal data bus is 64 bits wide.

Instruction cache has 4Kbytes organized as a two way set-associative memory with 32 bytes per cache block. It transfers 64 bits per clock cycle. Data cache is a two way set-associative memory of 8Kbytes.It transfers 128 bits per clock cycle. Write-back policy is used. Bus control co-ordinates the 64 bit data transfer between chip and outside world.

MMU implements protected 4Kbyte paged virtual memory of 2 32 bytes via TLB. There are two floating point units: multiplier-unit and adder-unit which can be used separately or simultaneously under coordination of the floating point control unit. Both integer unit and floating point control unit can execute concurrently. Graphics unit executes integer operations corresponding to 8,126,32 bit pixel data types.

This unit supports three-dimensional drawing in a graphics frame buffer with color intensity, shading and hidden surface elimination. Merge register is used only by vector integer instructions. I860 executes 82 instructions including 42 RISC integer, 24 floating point,10 graphics and 6 assembler pseudo operations. All these instructions execute in one cycle.

Superscalar processors These are designed to exploit more instruction-level parallelism in user programs. Only independent instructions can be executed in parallel without causing a wait state. The instruction-issue degree 2 to 5 in practice. Superscalar processor depends more on an optimizing compiler to exploit parallelism.

Multiple instructions pipelines are used. Instruction cache supplies multiple instructions per fetch. Multiple functional units are built into the integer unit and into the floating –point unit. Multiple data buses exist among the functional units. IBM RS/6000,DEC 21064,Intel i960CA are examples of superscalar processors.

Due to the reduced CPI and higher clock rates used, most superscalar processors outperform scalar processors. The maximum number of instructions issued per cycle ranges from two to five in these four superscalar processors.

Register files in IU and FPU each have 32 registers. Both IU and FPU are implemented on the same chip. Superscalar degree is low due to limited instruction parallelism that can be exploited in ordinary programs.

Reservation stations and reorder buffers can be used to establish instruction windows. The purpose is to support instruction look ahead and internal data forwarding, which is needed to schedule multiple instructions through the multiple pipelines simultaneously.

VLIW Architecture This architecture is generalized from two well established concepts: horizontal micro coding and superscalar processing. The instruction word has hundreds of bits in length. Multiple functional units are used concurrently. All functional units share the use of common large register file.

VLIW concept is borrowed from horizontal microcoding. Different fields of the long instruction word carry the opcodes to be dispatched to different functional units. Programs written in conventional short instruction words(32 bits) must be done by a compiler which can predict branch outcomes using elaborate heuristics or run-time statistics.

VLIW machines behave much like superscalar machines with three differences. Decoding of VLIW instructions is easier than that of superscalar instructions. Code density of superscalar machine is better when the available instruction- level parallelism is less than that exploitable by the VLIW machine.

This is because the fixed VLIW format includes bits for non executable operations, while the superscalar processor issues only executable instructions. Superscalar machine can be object- code-compatible with a large family of nonparallel machines,but VLIW machine exploiting different amounts of parallelism would require different instruction sets.

Instruction parallelism and data movement in a VLIW architecture are specified at compile time. Run-time scheduling and synchronization are thus completely eliminated. One can view VLIW processor as an extreme of superscalar processor in which all independent or unrelated operations are already synchronously compacted together in advance.

VLIW opportunities: Random parallelism among scalar operations is exploited instead of regular or synchronous parallelism as in a vectorised supercomputer or in an SIMD computer. Success depends heavily on the efficiency in code compaction. VLIW architecture is totally incompatible with that of any conventional general- purpose processor.

Instruction parallelism embedded in the compacted code may require a different latency to be executed by different functional units even though the instructions are issued at the same time. Therefore, different implementations of the same VLIW architecture may not be binary-compatible with each other. By explicitly encoding parallelism in the long instruction, a VLIW processor can eliminate the hardware or software needed to detect parallelism.

Advantage of VLIW is simplicity in hardware structure and instruction set. It can perform well in scientific applications where the program behaviour is more predictable. In general-purpose applications, the architecture may not be able to perform well. Due to the lack of compatibility with conventional hardware and software,the architecture is not entered the mainstream pf computers. The dependence on trace-scheduling and code- compaction has prevented it from gaining acceptance in the commercial world.