7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32
IA-32 Intel Architecture
IA-32 processors 386 & 486 processors Pentium processors P6 family processors (Pentium Pro, Pentium II, Pentium III) : based on the P6 family microarchi-tecture Pentium 4 processors, Intel Xeon processors, Pentium D processors, Pentium processor Extreme Editions : based on the Intel NetBurst microarchi-tecture
IA-32 Intel Architecture A Brief history of the IA-32 Architecture Coming from …16-bit processors 8086 processors − 16-bit registers, 16-bit external data bus − 20-bit addressing 1 MByte address space 8088 processors : 8-bit external data bus 8086/8088 introduced ‘segmentation’ to the IA-32 architecture: four 16-bit segment registers point to memory segments of 64 Kbytes
Internal architecture of 8086
Intel 8085 architecture : 8-bit data, 16-bit address
Intel 286 processor (1982) Provide two programming modes 1) Real mode functions exactly same as 8086 use only 20 least significant address lines (max. 1 MB) faster than 8086 due to redesigning and higher clock 2)Protected mode 16 new instructions are added support multi-program environment by giving each program a predetermined amount of memory (16 MB) programs no longer have physical addresses, but are addressed by a segment selector Several programs can be loaded into memory at the same time, but protected from each other
The 8086 and microprocessors. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.
Intel 386 processor (1985) First 32-bit processor in the IA-32 architecture family 32-bit registers used both for holding operands and addressing 32-bit address bus that supports up to 4 Gbytes of physical memory Segmented-memory model and flat memory model Paging (fixed 4-Kbyte page) for virtual memory management 386CX, 386DX(with FPU inside)
Internal architecture of 80386
Internal registers of 80386
Intel 486 processor (1989) Added more parallel execution by using five- stage pipeline 8-Kbyte on-chip first-level cache Integrated x87 FPU Power saving and system management capabilities Includes FPU
Intel Pentium processor (1993) Added a second execution pipeline to achieve superscalar performance (u & v pipelines executing two instructions per clock) Split on-chip caches (8-KByte code cache and 8- KByte data cache) Data cache uses MESI (coherence) protocol Branch prediction with an on-chip branch table Internal data path : 128, 256 bits External data bus : 64 bits Enhanced by MMX technology that uses SIMD execution model
FIGURE 3-28 Processor model for the Pentium. The BIU supplies instructions to the CPU via two pipelines called the u and v pipes. In addition, two separate 8K data and code caches are provided. John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.
The U and V Pipes U and V pipes : dual five-stage pipelines Prefetcher and queue units provide paired instructions for U and V pipes U pipe : executes all Pentium instructions V pipe : executes only simple integer instructions (data is already in the CPU registers) --- sorting of instructions is performed by the prefetcher Two pipelines and two ALUs Pentium executes two instructions simultaneously (in one clock cycle). Condition : two instructions are simple and do not depend on each other – no data dependency.
Superpipelined vs. Superscalar Superpipelining : divide the instruction execution pipeline into the smaller stages. [ex] 5-stage pipeline (80486, Pentium) 12-stage (P6 processors) Superscalar : execute two or more instructions per clock cycle by using multiple execution units (include ALUs). [ex] Pentium executes two instructions simultaneously = 2-way superscalar Pentium II, III & Celeron : 3-way superscalar
MMX (Multimedia Extension) : provides 2 architectural enhancements over non-MMX Pentium ① 57 instructions are added for multimedia (audio, video, and graphic data) applications. ② SIMD(Single-Instruction stream Multiple-Data stream) allows the same operation to be performed on multiple data items. Because many multimedia applications require large blocks of data to be manipulated, SIMD provides a significant performance enhancement. For general applications, 10~20% performance improved. For multimedia applications, nearly 70% improved.
SIMD Execution Model
P6 family processors ( ) Intel Pentium Pro processor –Three-way superscalar : decode, dispatch, and complete execution (retire) of three instructions per clock cycle on average –Introduced the dynamic execution (micro-data flow analysis, out-of-order execution, superior branch prediction, and speculative execution) in a superscalar implementation –Enhanced by caches (two on-chip 8-Kbyte 1st-level cache and 256-Kbyte 2nd-level cache in the same package (two-chips in the same package) –36 address lines max. 64 GB memory
FIGURE 1-14 The Pentium Pro is two chips in one. The larger die is the processor, the smaller a 256K L2 cache. (Courtesy of Intel Corporation.) John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.
Dynamic Execution : a new approach to processing S/W instructions, that reduces idle processor time ① Multiple Branch Prediction : Pentium Pro can look as far as 30 instructions ahead to anticipate conditional branches reduce waste of pipeline clocks ② Data Flow Analysis : looks at upcoming S/W instruc- tions for the optimal sequence of processing ③ Speculative Execution : allows to execute instructions in a different order from which they are entered the processor = “out-of-order execution”. The result of these instructions are stored as speculative results until their final states can be determined
P6 family processors (cont’d) Pentium II processor –Added Intel MMX technology –Processor core is packaged in the single edge contact cartridge (SECC) –1 st -level(L1) caches are enlarged (16 Kbytes each) –2 nd -level(L2) cache sizes of 256 KB, 512 KB, 1 MB are supported –A half-clock speed backside bus connects 2 nd - level cache and the processor –Multiple low-power states such as AutoHALT, Stop-Grant, Sleep, and Deep Sleep are supported to conserve power when being idle
P6 family processors (cont’d) Pentium II Xeon processor –Includes 4-way and 8-way, 2 Mbyte 2 nd -level cache running on a dual-clock speed backside bus Intel Celeron processor –Focused on the PC market –Pentium II without L2 cache –Use the slot 1 connector without the plastic cover called “naked CPU”
John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. Celeron Board
P6 family processors (cont’d) Celeron A : Includes 128KB L2 cache on the same die with processor. – Drawback : 66 MHz bus cycle – 370-pin PGA package (called Socket 370)
P6 family processors (cont’d) Pentium III processor –Introduced Streaming SIMD Extensions (SSE) : expand SIMD execution model by providing new set of 128-bit registers and the ability to perform SIMD operations on packed single-precision floating-point values Pentium III Xeon processor –Enhanced a full-speed, on-die Advanced Transfer Cache
John Uffenbeck The 80x86 Family: Design, Programming, and Interfacing, 3e Copyright ©2002 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. Pentium III with integrated L2 cache (more than 22 million transistors)
2.1.7 Pentium 4 Processor Family ( ) Based on Intel NetBurst microarchitecture Introduced Streaming SIMD Extentions 2 (SSE2) Pentium 4 processor 3.40 GHz supports Hyper Threading Technology and Streaming SIMD Extentions 3 (SSE3) Pentium 4 Processor Extreme Edition supports Intel Extended Memory 64 Technology and Hyper-Threading Technology Pentium 4 Processor 6xx series supports Intel Extended Memory 64 Technology
Streaming SIMD Extensions 2 (SSE2)
Horizontal Data Movement in ADDSUBPD
2.1.8 Intel Xeon Processor ( ) Based on Intel NetBurst microarchitecture As a family, this group of IA-32 processors is designed for use in multiprocessor server systems and high- performance workstations Intel Xeon processor MP supports for Hyper-Threading Technology 64-bit Intel Xeon processor 3.60 GHz with 800 MHz System Bus introduced Intel Extended Memory 64 Technology
2.1.9 Intel Pentium M Processor ( ) Low-power mobile processor family Designed for extending battery life and seamless integration Its extended microarchitecture includes: –Support for Dynamic Execution –Low-power core with copper interconnect –On-die, primary 32-KB instruction cache and 32-KB write-back data cache, and second-level 2 MB cache with Advanced Transfer Cache Architecture –Advanced Branch Prediction and Data Prefetch Logic –Support for MMX tech, Streaming SIMD instructions, and SSE2 instruction set
Intel Pentium Processor Extreme Edition (2005) Introduced dual-core technology that provides advanced H/W multi-threading support Based on Intel NetBurst microarchitecture Supports SSE, SSE2, SSE3, Hyper-Threading Technology, and Intel Extended Memory 64 Technology
The Processor War 7-Aug-15 (36)