Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

Similar presentations


Presentation on theme: "COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept."— Presentation transcript:

1 COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept. stasys.maciulevicius@ktu.lt

2 ©S.Maciulevičius2 2009-2014 von Neumann architecture The term Von Neumann architecture derives from a computer architecture proposal by the mathematician and early computer scientist John von Neumann and others (1945), entitled First Draft of a Report on the EDVAC This describes a design architecture for an electronic digital computer with subdivisions of a processing unit consisting of an arithmetic logic unit and processor registers, a control unit containing an instruction register and program counter, a memory to store both data and instructions, external mass storage, and input and output mechanisms

3 ©S.Maciulevičius3 2009-2014 von Neumann architecture Structure of such computer looks like this: Arithmetic logic unit Control unit Input/ output Memory Communication unit (bus)

4 ©S.Maciulevičius4 2009-2014 4 Processor Now the arithmetic logic unit and control unit (sometimes called instruction and data processors) are integrated into one unit – central processor (CPU).

5 ©S.Maciulevičius5 2009-2014 Processor Control unit Arithmetic logic unit PROCESSOR Information about operation flow Control signals Instruction (from memory) Data (from memory) Results (to memory) Now the arithmetic logic unit and control unit are integrated into one unit – central processor (CPU):

6 ©S.Maciulevičius6 2009-2014 4 Processor Control unit fetches instructions from the memory, analyzes them and controls operations in functional unit; Arithmetic logic unit executes operations according to current instruction; These two devices work together: the control unit generates control signals according to operation code, the arithmetic logic unit transmits condition signals to the control unit informing about the running operation; these may affect the generation of the subsequent control signals (e.g., sign of operand, value of some bit, etc.)

7 ©S.Maciulevičius7 2009-2014 Functional unit of processor Internal memory (registers, cache) Operations performing circuits Functional unit results operands Control signals (from control unit) Information about running operation (to control unit) Data (from memory) Results (to memory)

8 ©S.Maciulevičius8 2009-2014 Functional unit of processor If we look at the interior of the functional unit, it can be divided into two groups of elements: Internal memory, which is required to keep data to be processed (operands); it consists of registers, a separate triggers, cache memory [cache], in some cases – stack; Circuits performing the operations - they perform all necessary actions to process the information – addition, logic operations, shifts, etc.

9 ©S.Maciulevičius9 2009-2014 Processor market Intel company – the leader (about 80% market share – for computers) AMD company – the main competitor (about 19% market share) The remaining producers – about 1% of the market New gamer in mobile processor market - ARM

10 ©S.Maciulevičius10 2009-2014 Intel processors 4004 8008 8086 8088 80286 80386SX 8038680486 80486DX2 80486DX4 80486SX 80486 8085 8080

11 ©S.Maciulevičius11 2009-2014 Intel 4004 Intel 4004, first microprocessor (November, 1971) Designed for calculator Data word - 4 bits, 16 registers (4 bits), Instruction length - 8 bits, Instruction number - 46; Separate memories: 1 KB – for data, 4 KB – for program, PC length - 12 bits, 4 level stack for subprogram calls, Frequency - 108 KHz, 2300 transistors (fabrication process - 10  m).

12 ©S.Maciulevičius12 2009-2014 Intel 8086 (1978) Software compatible with Intel 8080, has similar register set Data word - 16 bits Instruction prefetch buffer length - 6 bytes Four 16-bit general registers Four 16-bit registers for addresses Segment registers Addresable memory - 1 MB 29 000 transistors (fabrication process - 3  m) Frequency - 4,77 MHz, price - $360

13 ©S.Maciulevičius13 2009-2014 Intel 8086 ES IP DS SS CS queue tion truc- Ins- A19-A16(ST6-ST2) AD15-AD0 AX CX DX BX Address/ data bufferAddr./stat. buff. DI SI BP SP BH BL DH DL CH CL AH AL control and synchroni- sation unit F ALU Address summator 8088: 8-bit data bus 4-byte instruction queue

14 ©S.Maciulevičius14 2009-2014 Intel 80386 (1985) Extended addressing capability, adding index multiplier (base reg + index reg  multiplier (1, 2, 4 ar 8) + displacement (8 / 32-bit constant) Added memory management unit (MMU), privilege levels (using protection rings) Addressable memory - 4 GB Virtual memory - 64 TB Transistor count - 275 000 (1,5  m) Frequency - 16 MHz, price - $299 80386SX - with 16-bit data bus:  addressable memory - 16 MB,  virtual memory - 256 GB) 80386SL (1990) - first microprocessor for notebooks:  addressable memory - 4 GB, virtual memory - 64 TB.  transistor count - 855 000 (1  m), frequency - 20 MHz.

15 ©S.Maciulevičius15 2009-2014 Intel 80486 (1989) has instruction pipeline internal 8KB cache both for data and instructions integrated FPU addressabe memory - 4 GB, virtual memory - 64 TB transistor count - 1,2 mln. (1  m; 50 MHZ - 0,8  m) frequency - 25 MHz, price - $900 along with basic variant (DX) the 80486DX2 (with frequency duplication) and 80486DX4 (with frequency triplication) were developed 80486SX (1991) - without FPU 80486SL (1992) - for notebooks

16 ©S.Maciulevičius16 2009-2014 Intel processors (2) Pentium P5 Pentium MMX Pentium Pro P7 Pentium IIPentium III Pentium 4 P6 IA-64 ItaniumItanium 2 CoreCore Duo P8P8 Core ix

17 ©S.Maciulevičius17 2009-2014 Intel Pentium (1993) The first superscalar x86 architecture processor (with dual integer pipelines, a faster FPU) 5-stage pipeline branch prediction separate 8KB instruction and data caches 64-bit external databus addressabe memory - 4 GB virtual memory - 64 TB 3.1 million transistors fabricated in a 0.8 µm process frequency - 60 MHz, price - $878

18 ©S.Maciulevičius18 2009-2014 Pentium 4 7th generation processor (P7) NetBurst microarchitecture Oriented on high clock frequency (1,4-1,5 times higher than in other processors) This significantly increased the length of the pipeline, made the devices more complex and therefore increased energy consumption New variant of this processor – Prescott (P4-E); it supports 64-bit integer operations and EM64T addressing

19 ©S.Maciulevičius19 2009-2014 Intel Pentium pipelines

20 ©S.Maciulevičius20 2009-2014 AMD K5 (5k86) The K5 was AMD's first x86 processor to be developed entirely in-house, introduced in March 1996 Its primary competition was Intel's Pentium The branch target buffer was four times the size of the Pentium's and register renaming improved parallel performance of the pipelines It has 16 KB instruction cache, which was double that of the Pentium

21 ©S.Maciulevičius21 2009-2014 AMD K7 (Athlon, Duron) The original Athlon was the first 7th-generation x86 processor and retained the initial performance lead it had over Intel's competing processors for a significant period of time Superpipelined superscalar processor Has x86  RISC86 decoders Three o-o-o superscalar superpipelined FPU, executing all x87 (foating point), MMX and 3DNow! instructions Three o-o-o superscalar superpipelined integer ALUs Three o-o-o superscalar superpipelined address generating units Larger L1 caches (64 KB + 64 KB) 200 MHz system bus Enhanced dynamic branch prediction 37 mln transistors

22 ©S.Maciulevičius22 2009-2014 AMD K7 microarchitecture

23 ©S.Maciulevičius23 2009-2014 AMD K7 pipelines 123456 Fetch Scan Align 1 Align 2 Edec IDec 78910 Sched Ex Addr D C 101112131415 Stack Name WSch Sched FReg FX 0 FX 1 FX 2 FX 3 897 Load In case of branch misprediction 10 clocks will be lost Integer Floating point

24 ©S.Maciulevičius24 2009-2014 Transmeta Crusoe For mobile systems Original VLIW instruction set 1 FPU, 2 ALU, 1 LSU, 1 BU 64 registers Decoding of x86 instructions Code Morphing Software Enhanced power manegement Models:  TM3200 (only one 96 KB L1 cache),  TM5400 (256 KB L1 cache, 256 KB L2 cache),  TM5600 (512 KB L1 cache, 512 KB L2 cache)  TM5800 (512 KB L1 cache, 512 KB L2 cache) In 2002 - TM6000 In 2003 - TM8000 (Efficeon)

25 ©S.Maciulevičius25 2009-2014 Transmeta Crusoe FPU (Float Point Unit) ALU (Integer ALU) LSU (Load- Store Unit) BU (Branch Unit) FADDADDLDBRCC 128-bit bundles of instructions (molecule)

26 ©S.Maciulevičius26 2009-2014 Transmeta Crusoe TM8000

27 ©S.Maciulevičius27 2009-2014 PowerPC processors IBM 801 minicomputer (it has RISC instruction set) was as prototype Superscalar RISC system System/6000 was introduced in early 1990 Soon it becomes name POWER (Performance Optimization with Enhanced RISC) architecture Thereafter, IBM formed an alliance with Motorola (68000) and Apple, wich used a Motorola processor in Macintosh PCs So PowerPC architecture was born

28 ©S.Maciulevičius28 2009-2014 PowerPC processors 601: the first implementation of the PowerPC architecture, released in 1992; 603: 32-bit processor for low-end desktops and notebooks 604: 32-bit processor for desktops and low-end servers; 620: first 64-bit processor for high-end servers

29 ©S.Maciulevičius29 2009-2014 PowerPC processor 604e RISC instruction set For servers and workstations Core uses 1,9V power Dispatch Unit  Issues up to 4 instructions per clock  Issue buffer contains 8 instructions Completion Unit  In one clock completes up to 4 instructions plus 1 store plus 1 branch instruction

30 ©S.Maciulevičius30 2009-2014 PowerPC processor 604e Load/Store Unit  Hardware support for unaligned little-endian access  Hardware controlled parallel access to several registers (reads and stores)  Out-of-order (o-o-o) reads and stores Three integer units (IU):  Two single-cycle integer units - SCIU  One multiple-cycle integer unit (MCIU) FPU – IEEE-754 standard support Branch prediction:  512-entry branch history table  64-entry branch target address cache

31 ©S.Maciulevičius31 2009-2014 PowerPC 604e pipeline Fetch SCIU1SCIU2MCIUFPUBPULSU Execute Complete Writeback Dispatch Decode

32 ©S.Maciulevičius32 2009-2014 PowerPC processor 7400 (G4) Superscalar RISC processor Issues up to 4 instructions per clock Executes up to 8 instructions in parallel Has 8 functional units and 3 register files:  IU1 and IU2 – two integer units and general register file  FPU – floating point unit and register file  VPU and VALU - two vector units and vector register file  BPU – branch processing unit  SRU – system register unit  LSU – load/store unit Separate 32 KB instruction and data caches L2 level cache controller For servers and workstations

33 ©S.Maciulevičius33 2009-2014 PowerPC processor 7400 (G4) VPU and VALU - vector units and vector register file

34 ©S.Maciulevičius34 2009-2014 PowerPC processor 7400 (G4) IU1 and IU2 – two integer units and general register file

35 ©S.Maciulevičius35 2009-2014 7400 (G4) pipeline

36 ©S.Maciulevičius36 2009-2014 Why RISC processors are so not popular? Incompatibility with x86 instruction set. Therefore, x86-based programs may be executed only through emulation. And this in several tens of % reduces advantages of RISC Software. Initially, traditional PC operating system was DOS. A lot of popular and effective programs were written in DOS and 16-bit versions of Windows. Meanwhile, various RISC platforms used different and incompatible versions of Unix, for which is written little popular and effective programs - more programs were developed for workstations and servers

37 ©S.Maciulevičius37 2009-2014 Why RISC processors are so not popular? A higher price for RISC processors. Although the original idea was to RISC processor was more simply RISC chips, in fact RISC chips were actually more expensive than the Intel x86. Wider RISC bus (128 or even 256 bits) requires more expensive and more complex control circuits, chipsets and boards. Workstations and servers oriented decisions had been too expensive for a PC. RISC systems manufacturers passivity. "Serious" companies (Sun, DEC) felt that there is no need to reduce the cost of RISC workstations because of their indisputable advantages

38 ©S.Maciulevičius38 2009-2014 Comparing RISC and CISC Comparing RISC and CISC processors Performance of some processors in SPEC2000 Frequency, MHz SPECINT2000 SPECFP2000


Download ppt "COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept."

Similar presentations


Ads by Google