Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.

Slides:

Advertisements

Similar presentations

Computer Organization and Architecture

Advertisements

Streaming SIMD Extension (SSE)

CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.

Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Lecture 6: Multicore Systems

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Multicore & Parallel Processing P&H Chapter ,

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.

Instruction Level Parallelism (ILP) Colin Stevens.

Chapter 17 Parallel Processing.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

Chapter 7 Multicores, Multiprocessors, and Clusters.

Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

Chapter One Introduction to Pipelined Processors.

Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.

Multi-core architectures. Single-core computer Single-core CPU chip.

Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.

Multi-Core Architectures

1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

RISC Architecture RISC vs CISC Sherwin Chan.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

Morgan Kaufmann Publishers

Processor Architecture

Pipelining and Parallelism Mark Staveley

Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

EKT303/4 Superscalar vs Super-pipelined.

Pipelining Example Laundry Example: Three Stages

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.

Processor Level Parallelism 1

PipeliningPipelining Computer Architecture (Fall 2006)

Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,

COMP 740: Computer Architecture and Implementation

Advanced Architectures

Instruction Level Parallelism

How do we evaluate computer architectures?

Computer Architecture Principles Dr. Mike Frank

Parallel Processing - introduction

Multi-core processors

Multi-core processors

CS 147 – Parallel Processing

Multi-core processors

Pipeline Implementation (4.6)

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

/ Computer Architecture and Design

Multi-Processing in High Performance Computer Architecture:

Levels of Parallelism within a Single Processor

Computer Architecture Lecture 4 17th May, 2006

Serial versus Pipelined Execution

Coe818 Advanced Computer Architecture

Chapter 1 Introduction.

/ Computer Architecture and Design

Chapter 11: Alternative Architectures

Levels of Parallelism within a Single Processor

CS 286 Computer Organization and Architecture

Multicore and GPU Programming

Presentation transcript:

Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff

Processor Execution time The time taken by a program to execute is the product of – Number of machine instructions executed – Number of clock cycles per instruction (CPI) – Single clock period duration Example: 10,000 instructions, CPI=2, clock period = 250 ps

Processor Execution time Instruction Count for a program – Determined by program, ISA and compiler Average Cycles per instruction (CPI) – Determined by CPU hardware – If different instructions have different CPI Average CPI affected by instruction mix Clock cycle time ( inverse of frequency ) – Logic levels – technology

Reducing clock cycle time Has worked well for decades. Small transistor dimensions implied smaller delays and hence lower clock cycle time. Not any more.

CPI (cycles per instruction) What is LC-3 cycles per instruction? Instructions take 5-9 cycles (p. 568), assuming memory access time is one clock period. – LC-3 CPI may be about 6*. (ideal) No cache, memory access time = 100 cycles? – LC-3 CPI would be very high. Cache reduces access time to 2 cycles. – LC-3 CPI higher than 6, but still reasonable. Load/store instructions are about 20-30%

Parallelism to save time Do things in parallel to save time. Example: Pipelining – Divide flow into stages. – Let instructions flow into the pipeline. – At a time multiple instructions are under execution.

Chapter 4 — The Processor — 7 Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance Four loads: time = 4x2 = 8 hours Pipelined: Time in example = 7x0.5 = 3.5 hours Non-stop = 4x0.5 = 2 hours.

Chapter 4 — The Processor — 8 Pipeline Processor Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

Pipelining: Issues Cannot predict which branch will be taken. – Actually you may be able to make a good guess. – Some performance penalty for bad guesses. Instructions may depend on results of previous instructions. – There may be a way to get around that problem in some cases.

Instruction level parallelism (ILP): Pipelining is one example. Multiple issue: have multiple copies of resources – Multiple instructions start at the same time – Need careful scheduling Compiler assisted scheduling Hardware assisted (“superscaler”): “dynamic scheduling” – Ex: AMD Opteron x4 – CPI can be less than 1!.

Flynn’s taxonomy Michael J. Flynn, 1966 Data Streams SingleMultiple Instruction Streams SingleSISD: Intel Pentium 4 SIMD: SSE instructions of x86 MultipleMISD: No examples today MIMD: Intel Xeon e5345 Instruction level parallelism is still SISD SSE (Streaming SIMD Extensions): vector operations Intel Xeon e5345: 4 cores

Multi what? Multitasking: tasks share a processor Multithreading: threads share a processor Multiprocessors: using multiple processors – For example multi-core processors (multiples processors on the same chip) – Scheduling of tasks/subtasks needed Thread level parallelism: – multiple threads on one/more processors Simultaneous multi-threading: – multiple threads in parallel (using multiple states)

Multi-core processors Power consumption has become a limiting factor Key advantage: lower power consumption for the same performance – Ex: 20% lower clock frequency: 87% performance, 51% power. A processor can switch to lower frequency to reduce power. N cores: can run n or more threads.

Multi-core processors Cores may be identical or specialized Higher level caches are shared. Lower level cache coherency required. Cores may use superscalar or simultaneous multi-threading architectures.

LC-3 states InstructionCycles ADD, AND, NOT, JMP 5 TRAP8 LD, LDR, ST, STR 7 LDI, STI9 BR5, 6 JSR6