2. A New Era in Processor Evolution Dezső Sima Fall 2006  D. Sima, 2006.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Pentium microprocessors CAS 133 – Basic Computer Skills/MS Office CIS 120 – Computer Concepts I Russ Erdman.
Chapter 1 An Introduction To Microprocessor And Computer
RISC vs CISC Yuan Wei Bin Huang Amit K. Naidu. Introduction - RISC and CISC Boundaries have blurred. Modern CPUs Utilize features of both. The Manufacturing.
A New Era in Processor Evolution Dezső Sima Fall 2007 (Ver. 2.2)  Dezső Sima, 2007.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
The AMD and Intel Architectures COMP Jamie Curtis.
Microarchitecture of Superscalars (4) Decoding Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Computer performance.
Computer Architecture CST 250 INTEL PENTIUM PROCESSOR Prepared by:Omar Hirzallah.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Evolution of the ILP Processing Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
II. A new era in processor evolution Dezső Sima Spring 2007 (Ver. 2.0)  Dezső Sima, 2007.
TECH 6 VLIW Architectures {Very Long Instruction Word}
Last Time Performance Analysis It’s all relative
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Lynn Choi School of Electrical Engineering Microprocessor Microarchitecture The Past, Present, and Future of CPU Architecture.
Led the WWII research group that broke the code for the Enigma machine proposed a simple abstract universal machine model for defining computability devised.
Pre-Pentium Intel Processors /
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Intel’s Penryn Sima Dezső Fall 2007 Version nm quad-core -
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
History of Microprocessor MPIntroductionData BusAddress Bus
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
The End of Conventional Microprocessors Edwin Olson 9/21/2000.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Introduction.  This course is all about how computers work  But what do we mean by a computer?  Different types: desktop, servers, embedded devices.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Computer Architecture Introduction Lynn Choi Korea University.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
Sima Dezső Introduction to multicores October Version 1.0.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
CS203 – Advanced Computer Architecture
Lecture # 10 Processors Microcomputer Processors.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 3.
William Stallings Computer Organization and Architecture 6th Edition
Microprocessor Microarchitecture Introduction
Lynn Choi School of Electrical Engineering
Visit for more Learning Resources
Lynn Choi School of Electrical Engineering
Assembly Language for Intel-Based Computers, 5th Edition
Guide to Operating Systems, 5th Edition
Architecture & Organization 1
5.2 Eleven Advanced Optimizations of Cache Performance
Basic Computer Organization
I. Evolution of the ILP Processing
Technology and Historical Perspective: A peek of the microprocessor Evolution 11/14/2018 cpeg323\Topic1a.ppt.
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
Chapter 1 Introduction.
A new era in processor evolution
1. Evolution of ILP-processing
Microarchitecture of Superscalars (4) Decoding
What is Computer Architecture?
A New Era in Processor Evolution
The University of Adelaide, School of Computer Science
CSE378 Introduction to Machine Organization
Presentation transcript:

2. A New Era in Processor Evolution Dezső Sima Fall 2006  D. Sima, 2006

Contents 1. Processor performance 2. Efficiency of processors 3. Addressing the leveling off of processor efficiency 4. Aggressively raising clock frequency 5. The efficiency wall 6. The thermal wall 7. The skew wall 8. EPIC architectures/processors 9. The end of an era in processor evolution

Relative performance Absolute performance Number of succesfully executed instructions/sec Number of succesfully executed operations/sec (SIMD) Relating the execution times of a benchmark program on the tested system to a reference system according to the following interpretation: E.g.: SPECint92, SPECint_base Introduction (1) 1. Processor performance f c : Clock frequency IPC: Instructions/cycle OPI: Operations/cycle

1.1. Introduction (2) In general purpose applications: where: IPC: issued instructions per cycle η: number of successfully executed/issued instructions (efficiency of the speculative execution)

In performance/efficiency studies: Theoretical interpretation: P a Practical measurement: P r 1.1. Introduction (3) ?

If the following were true: In that case: 1.1. Introduction (4) I: Number of instructions in the application considered

However: Figure 1.1.: Runtime ratios of the component programs of SPECint2000 Source: Introduction (5)

When comparing the performance of two systems: This estimation is useable in trend considerations Introduction (6)

Comparing the efficiency of two systems: 1.1. Introduction (7)

1.2. Evolution of processor performance (1) Figure 1.2: Integer performance growth of Intel’s x86 processors

Figure 1.3: Integer performance growth (in general - 1) Source: X86-64 Technology White Paper, AMD Inc., Sunnyvale, CA, Evolution of processor performance (2)

3. Figure 1.4: Integer performance growth (in general - 2) Source: F. Labonte, www-vlsi.stanford.edu/group/chart/specInf2000.pdf 1.2. Evolution of processor performance (3)

2.1. Introduction ? 2. Efficiency of processors

Figure 2.1: Efficiency of Intel processors 2.2. Growth of processor efficiency (1)

Figure 2.2: Growth of processor performance/efficiency (in general) Source: J. Birnbaum, „Architecture at HP: Two decades of Innovation”, Microprocessor Forum, October 14, Growth of processor efficiency (2)

2.3. Contribution of raising processor efficiency to the growth of processor performance (up to the 2 nd generation of superscalars) A második generációig az órafrekvencia és a hatékonyság növelése egyenlő arányban járultak hozzá a teljesítmény növeléséhez. ?

2.4. Sources of raising processor efficiency Increasing the word length Introducing and increasing temporal parallelism Introducing and increasing issue parallelism 8/16  32 bit (286  386DX) 1 st and 2 nd generation pipeline processors (386DX, 486DX) 1 st and 2 nd generation superscalars (Pentium, Pentium Pro)

2.5. Limit of raising processor efficiency (1) Processing width 4 RISC instructions/cycle ~3 CISC instructions/cycle Figure 2.3: Processing width of 2 nd generation (wide) superscalars vs extent of parallelism available in general purpose applications 2 nd generation superscalars (wide superscalars) Source: Wall: Limits of ILP, WRL TN-15, Dec. 1990

Figure 2.4: Growth of processor efficiency (in general) 2.5. Limit of raising processor efficiency (2)

2.5. Limit of raising processor efficiency (3) Beginning with 2 nd generation (wide) superscalars the sources of extensively raising processor efficiency became exhausted In general purpose applications: The width of 2 nd generation superscalars already approaches the extent of available parallelism (ILP)

Essentially widening the core by introducing EPIC architectures Aggresively raising clock frequency Main road of evolution (Sections 4 – 7) 3. Addressing the leveling off of processor efficiency (Section 8)

By reducing the logic depth of pipline stages By scaling down the feature size in the manufacturing process 4.1. Sources of raising clock frequencies (1) Raising clock frequency 4. Aggressively raising clock frequency

Figure 4.1: Evolution of Intel’s process technology Source: D. Bhandarkar: „The Dawn of a New Era”, 11. EMEA, May, Sources of raising clock frequencies (2)

20 30 Year      Pentium (5) 2005 No of pipeline stages Pentium Pro (~12) Pentium 4 (~20) Athlon-64 (12) P4 Prescott (~30) (14) Conroe  Athlon (6) K6 (6)  1995  Core Duo Figure 4.2: Number of pipeline stages in Intel’s and AMD’s processors 4.1. Sources of raising clock frequencies (3)

Figure 4.3: Max. logic depth of pipeline stages in processors (in terms of FO4) Source: F. Labonte www-vlsi.stanford.edu/group/chart/CycleFO4.pdf 4.1. Sources of raising clock frequencies (4)

Figure 4.4: Growth of clock frequencies in Intel’s x86 line of processors 4.2. Growth rate of clock frequencies (1)

Figure 4.5: Growth of clock frequencies (in general) 4.2. Growth rate of clock frequencies (2)

Emerging limits of evolution Ousting of major RISC families 4.3. Implications of aggressively raising clock frequencies Overview (4.3.2) (4.3.3)

Figure 4.6: The shift in performace leadership between RISC and x86 lines Ousting of major RISC families (2)

: CISCs overtook the performance leadership then it is a more intrinsic task to raise f c from a higher value than from a lower one in the same rate 1997: Intel and HP unveiled IA-64/Merced as the next generation architecture/processor line Cancelling of most major RISC lines, such as MIPS’s R-Lines, HP’s Alpha and PA lines, PowerPC Consortium’s PowerPC line Ousting of major RISC families (2)

Emerging limits of evolution The skew wall The thermal wall The efficiency wall (Section 5) (Section 6) (Section 7)

speed gap between the processor and the memory 5.1. Overview 5. The efficiency wall Basic reason: (widens on higher frequencies)

Memory transfer rates DRAM latencies Transfer rates of processor buses L2 cache latencies Main appearances of the speed gap between the processor and the memory: 5.1. Overview (2)

5.2. Speed gap between processor and memory (1) Figure 5.1: Latency of DRAM chips (in clock cycles)

Figure 5.2: Relative transfer rate of memories (D: dual channel) 5.2. Speed gap between processor and memory (2)

f c max at intro. (GHz) L2 size (Kbyte) L2 latency (clock cycles) Willamette Northwood Prescott Figure 5.3: Latency of L2 caches 5.2. Speed gap between processor and memory (3)

Figure 5.4: Relative transfer rates of processor buses 5.2. Speed gap between processor and memory (4)

5.3. Efficiency of 3 rd generation superscalars (1) 5.5: Efficiency of Intel’s Pentium III and Pentium 4 processors in general purpose applications

Figure 5.6: efficiency of AMD’s Athlon, Athlon XP and Athlon 64 processors in general purpose applications 5.3. Efficiency of 3 rd generation superscalars (2)

Figure 5.7: Main aspects of the memory subsystem affecting core efficiency 5.3. Efficiency of 3 rd generation superscalars (3)

Figure 5.8: Contrasting the efficiency of Intel’s and AMD’s processors 5.3. Efficiency of 3 rd generation superscalars (4)

Figure 5.9: Contrasting Intel’s and AMD’s processor design philosophies 5.3. Efficiency of 3 rd generation superscalars (5)

Diminishing return on higher clock frequencies Implication of the emerging efficiency wall: 5.3. Efficiency of 3 rd generation superscalars (6)

6. The thermal wall (1) Dissipation (D) : D d =A*C*V 2 *f c with A:ratio of the active gates C:effective capacity of the gates V:supply voltage f c :clock frequency I leak :leakage current Dynamic Static D s =V*I leak

6. The thermal wall (2) Figure 6.1:Chip dynamic and static power dissipation trends Source: N. S. Kim et al., „Leakage Current: Moore’s Law Meets Static Power”, Computer, Dec. 2003, pp

Figure 6.2: Relative dissipation of Intel’s x86 family of processors 6. The thermal wall (3)

Figure 6.3: Contrasting the evolution of Intel’s and AMD’s processor lines with the thermal wall 6. The thermal wall (4)

Figure 6.4: Intel’s P4 processor family (Netburst architecture) 6. The thermal wall (5)

Figure 6.5: The growth of relative dissipation of processors (in general) Source: R Hetherington, „The UltraSPARC T1 Processor” White Paper, Sun Inc., The thermal wall (6)

Implications of the thermal wall: 6. The thermal wall (7) Processor designs focus now more and more on power aware technics The approach to increase performance by aggressively raising clock frequency met the thermal wall

Reason: Figure 7.1: Skew between lines of parallel buses 7. The skew wall (1)

Figure 7.2: Equalizing skews among different bit lines of the processor bus on the MSI 915G Combo motherboard 7. The skew wall (2)

7. The skew wall (3) Introducing sequential buses Figure 7.3: Signal transfer over a sequential bus (also in slow peripheral buses due to impressive cost savings) Implication of emerging skews between bit lines of parallel buses:

Implication of emerging limits of evolution The approach to aggressively raise clock frequencies met the efficiency, thermal and skew walls and thus hit the dead end

8. EPIC architectures/processors (1) Essentially widening the core by introducing EPIC architectures Aggresively raising clock frequency Main road of evolution (Sections 4 – 7)(Section 8)

Instructions Principle of superscalar processing FEFE FEFE FEFE dynamic dependency resolution Processor dependent instructions Principle of VLIW processing FEFE FEFE FEFE VLIW: Very Large Instruction Word independent instructions (static dependency resolution) Processor Figure 8.1: Contrasting the principles of operation of superscalar and VLIW processors 8. EPIC architectures/processors (2)

VLIWEPIC EPIC: Explicitly Parallel Instruction Computer enhanced VLIW branch prediction explicit cache control (integration of advanced superscalar features) 8. EPIC architectures/processors (3) 1994: Intel, HP 2001: IA-64  Itanium 1997:EPIC designation

Figure 8.2: Overview of Itanium cores 8. EPIC architectures/processors (4)

Figure 8.3: The efficiency of Itanium processors 8. EPIC architectures/processors (5)

Figure 8.4: Expected spreading of the IA-64 architecture (Itanium processors) Source: L. Gwennap: Intel’s Itanium and IA-64: Technology and Market Forecast, MDR, EPIC architectures/processors (6)

Figure 8.5: Revenue expectations concerning Intel’s Itanium line 8. EPIC architectures/processors (7)

In general purpose applications: EPIC architectures/processors play a decreasing role 8. EPIC architectures/processors (8)

9. The end of an era in processor evolution (1) In general purpose applications beginning with the 2. generation superscalars processor efficiency leveled off, but both approaches to address leveling off efficiency met limits of evolution and thus hit the dead end Single core complex superscalars, – at the end of an era

9. The end of an era in processor evolution (2) A new era in processor evolution – The dawn of multicore, multithreded processors The number of processors will double also in each ~ 24 months Available hardware complexity increases further on exponentially (Moore’s law) Complexity is doubled in each ~ 24 moths

Figure 9.1: Rapid spreading of multi core processors revealed by Intel 9. The end of an era in processor evolution (3)