Instructor: Erol Sahin Hypertreading, Multi-core architectures, Amdahl’s Law and Review CENG331: Introduction to Computer Systems 14 th Lecture Acknowledgement:

Slides:

Advertisements

Similar presentations

CS1104: Computer Organisation School of Computing National University of Singapore.

Advertisements

Computer Abstractions and Technology

Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Computer Organization and Architecture 18 th March, 2008.

An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu

Introduction to Computer Systems Topics: Theme Five great realities of computer systems How this fits within CS curriculum S ‘08 class01a.ppt

EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.

Hyper-Threading Neil Chakrabarty William May. 2 To Be Tackled Review of Threading Algorithms Hyper-Threading Concepts Hyper-Threading Architecture Advantages/Disadvantages.

1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.

Introduction to Computer Systems* Topics: Theme Five great realities of computer systems How this fits within CS curriculum F ’07 class01a.ppt

Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.

CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.

COMP 321: Introduction to Computer Systems Scott Rixner Alan L. Cox

INTRODUCTION Jehan-François Pâris An evolving field Computer architectures keep changing –Building faster computers Supercomputers and.

Computer System Architectures Computer System Software

Lecture 2: Computer Performance

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.

Multi-core architectures. Single-core computer Single-core CPU chip.

EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.

Multi-Core Architectures

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

1 Carnegie Mellon The course that gives CMU its “Zip”! Course Overview (18-213): Introduction to Computer Systems 1 st Lecture, Aug. 26, 2014 Instructors:

1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.

13/03/07 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:

Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.

Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.

From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.

Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.

Introduction to Computer Systems Topics: Theme Five great realities of computer systems (continued) “The class that bytes”

Lecture 20: Parallelism & Concurrency CS 62 Spring 2013 Kim Bruce & Kevin Coogan CS 62 Spring 2013 Kim Bruce & Kevin Coogan Some slides based on those.

University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

Computer Organization II Topics: Theme Five great realities of computer systems How this fits within CS curriculum CS 2733.

Presentation 31 – Multicore, Multiprocessing, Multithreading, and Multitasking. When discussing modern PCs, the term “Multi” is thrown around a lot as.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Copyright © Curt Hill Parallelism in Processors Several Approaches.

Advanced Computer Networks Lecture 1 - Parallelization 1.

Computer Engineering Rabie A. Ramadan Lecture 2. Table of Contents 2 Architecture Development and Styles Performance Measures Amdahl’s Law.

Concurrency and Performance Based on slides by Henri Casanova.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Multiple processor systems and.

1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Carnegie Mellon Course Overview Introduction to Computer Systems 1.

Multiple processor systems

Course Overview CSE 238/2038/2138: Systems Programming

How do we evaluate computer architectures?

CENG334 Introduction to Operating Systems

Morgan Kaufmann Publishers

CSCE 212 Chapter 4: Assessing and Understanding Performance

Hyperthreading Technology

Introduction to Computer Systems

Parallel Processing Sharing the load.

Introduction to Computer Systems

EE 445S Real-Time Digital Signal Processing Lab Spring 2014

Introduction to Computer Systems

Recitation 6: Cache Access Patterns

Computer Evolution and Performance

Introduction to Computer Systems

/ Computer Architecture and Design

EE 155 / Comp 122 Parallel Computing

Lecture Topics: 11/1 Hand back midterms

Presentation transcript:

Instructor: Erol Sahin Hypertreading, Multi-core architectures, Amdahl’s Law and Review CENG331: Introduction to Computer Systems 14 th Lecture Acknowledgement: Some of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.

– 2 – Computation power is never enough Our machines are million times faster than ENIAC.. Yet we ask for more.. To make sense of universe To build bombs To understand climate To unravel genome Upto now, the solution was easy: Make the clock run faster..

– 3 – Hitting the limits… No more.. We’ve hit the fundamental limits on clock speed: No more.. We’ve hit the fundamental limits on clock speed: Speed of light: 30cm/nsec in vacuum and 20 cm/nsec in copper In a 10GHz machine, signals cannot travel more than 2cm in total In a 100GHz, at most 2mm, just to get the signal from one end to another and back We can make computers as small as that, but we face with other problems: More difficult to design and verify.. and fundamental problems Heat dissipation! Coolers are already bigger than the CPUs.. Going from 1MHz to 1GHz was simply engineering the chip fabrication process, not from 1GHz to 1 THz..

– 4 – Parallel computers New approach: Build parallel computers Each computer having a CPU running at normal speeds Systems with 1000 CPU’s are already available.. The big challenge: The big challenge: How to utilize parallel computers to achieve speedups Programming has been mostly a “sequential business” »Parallel programming remained exotic How to share information among different computers How to coordinate processing? How to create/adapt OSs?

Instructor: Erol Sahin Superscalar architectures, Simultenous Multi-threading CENG331: Introduction to Computer Systems 14 th Lecture Acknowledgement: Some of the slides are adapted from the ones prepared by Neil Chakrabarty and William May

– 6 – Threading Algorithms Time-slicing A processor switches between threads in fixed time intervals. High expenses, especially if one of the processes is in the wait state.Switch-on-event Task switching in case of long pauses Waiting for data coming from a relatively slow source, CPU resources are given to other processes

– 7 – Threading Algorithms (cont.) Multiprocessing Distribute the load over many processors Adds extra cost Simultaneous multi-threading Multiple threads execute on a single processor without switching. Basis of Intel’s Hyper-Threading technology.

– 8 – Hyper-Threading Concept At each point of time only a part of processor resources is used for execution of the program code. Unused resources can also be loaded, for example, with parallel execution of another thread/application. Extremely useful in desktop and server applications where many threads are used.

– 9 –

– 10 – Hyper-Threading Architecture First used in Intel Xeon MP processor Makes a single physical processor appear as multiple logical processors. Each logical processor has a copy of architecture state. Logical processors share a single set of physical execution resources

– 11 – Hyper-Threading Architecture Operating systems and user programs can schedule processes or threads to logical processors as if they were in a multiprocessing system with physical processors. From an architecture perspective we have to worry about the logical processors using shared resources. Caches, execution units, branch predictors, control logic, and buses.

– 12 – Advantages Extra architecture only adds about 5% to the total die area. No performance loss if only one thread is active. Increased performance with multiple threads Better resource utilization.

– 13 – Disadvantages To take advantage of hyper-threading performance, serial execution can not be used. Threads are non-deterministic and involve extra design Threads have increased overhead

Instructor: Erol Sahin Multi-Core processors CENG331: Introduction to Computer Systems 14 th Lecture Acknowledgement: Some of the slides are adapted from the ones prepared by Neil Chakrabarty and William May

– 15 – Multicore chips Recall Moore’s law: The number of transistors that can be put on a chip will double every 18 months! Still holds! 300 million transistors on an Intel Core 2 Duo class chip Question: what do you do with all those transistors? Increase cache.. But we already have 4MB caches.. Performance gain is little. Another option: put two or more cores on the same chip (technically die) Dual-core and quad-core chips are already on the market 80-core chips are manufactured.. More will follow.

– 16 – (a) A quad-core chip with a shared L2 cache. (b) A quad-core chip with separate L2 caches. Multicore Chips Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved Intel styleAMD style Shared L2: Good for sharing resources Greedy cores may hurt the performances of others

– 17 –

– 18 – Comparison: SMT versus Multi-core Multi-core Several cores each designed to be smaller and not too powerful Several cores each designed to be smaller and not too powerful Great thread-level parallelism Great thread-level parallelismSMT One large, powerful superscalar core One large, powerful superscalar core Great performance on running a single thread Great performance on running a single thread Exploits instruction-level parallelism Exploits instruction-level parallelism

– 19 – Cloud Computing and Server farms Cloud computing Server farms

Instructor: Erol Sahin Amdahl’s Law and Review CENG331: Introduction to Computer Systems 14 th Lecture Acknowledgement: Some of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.

– 21 – Performance Evaluation This week Performance evaluation Amdahl’s law Review

– 22 – Problem You plan to visit a friend in Cannes France and must decide whether it is worth it to take the Concorde SST or a 747 from NY (New York) to Paris, assuming it will take 4 hours LA (Los Angeles) to NY and 4 hours Paris to Cannes.

– 23 – Amdahl’s Law You plan to visit a friend in Cannes France and must decide whether it is worth it to take the Concorde SST or a 747 from NY(New York) to Paris, assuming it will take 4 hours LA (Los Angeles) to NY and 4 hours Paris to Cannes. time NY  Paristotal trip timespeedup over 747 time NY  Paristotal trip timespeedup over hours16.5 hours1 SST3.75 hours11.75 hours1.4 Taking the SST (which is 2.2 times faster) speeds up the overall trip by only a factor of 1.4!

– 24 – Speedup T1T1 T2T2 Old program (unenhanced)‏ T 1 = time that can NOT be enhanced. T 2 = time that can be enhanced. T 2 = time after the enhancement. Old time: T = T 1 + T 2 T 1 = T 1 T 2  T 2 New program (enhanced)‏ New time: T = T 1 + T 2 Speedup: S overall = T / T

– 25 – Computing Speedup Two key parameters: F enhanced = T 2 / T (fraction of original time that can be improved)‏ S enhanced = T 2 / T 2 (speedup of enhanced part)‏ T = T 1 + T 2 = T 1 + T 2 = T(1-F enhanced ) + T 2 = T(1 – F enhanced ) + (T 2 /S enhanced ) [by def of S enhanced ] = T(1 – F enhanced ) + T(F enhanced /S enhanced ) [by def of F enhanced ] = T((1 – F enhanced ) + F enhanced /S enhanced )‏ Amdahl’s Law: S overall = T / T = 1/((1 – F enhanced ) + F enhanced /S enhanced ) Key idea: Amdahl’s Law quantifies the general notion of diminishing returns. It applies to any activity, not just computer programs.

– 26 – Amdahl’s Law Example Trip example: Suppose that for the New York to Paris leg, we now consider the possibility of taking a rocket ship (15 minutes) or a handy rip in the fabric of space-time (0 minutes): time NY->Paristotal trip timespeedup over hours16.5 hours1 SST3.75 hours11.75 hours1.4 rocket0.25 hours8.25 hours2.0 rip0.0 hours8 hours2.1

– 27 – Lesson from Amdahl’s Law Useful Corollary of Amdahl’s law: 1  S overall  1 / (1 – F enhanced )‏ F enhanced Max S overall Moral: It is hard to speed up a program. Moral++ : It is easy to make premature optimizations.

– 28 – Other Maxims Second Corollary of Amdahl’s law: When you identify and eliminate one bottleneck in a system, something else will become the bottleneck Beware of Optimizing on Small Benchmarks Easy to cut corners that lead to asymptotic inefficiencies

– 29 – Review: Time to look back!

– 30 – Take-home message: Nothing is real!

– 31 – Great Reality #1: Int’s are not Integers, Float’s are not Reals Example 1: Is x 2 ≥ 0? Float’s: Yes! Int’s: * > * > ?? Example 2: Is (x + y) + z = x + (y + z)? Unsigned & Signed Int’s: Yes! Float’s: (1e e20) > e20 + (-1e ) --> ??

– 32 – Great Reality #2: You’ve Got to Know Assembly Chances are, you’ll never write program in assembly Compilers are much better & more patient than you are But: Understanding assembly key to machine-level execution model Behavior of programs in presence of bugs High-level language model breaks down Tuning program performance Understand optimizations done/not done by the compiler Understanding sources of program inefficiency Implementing system software Compiler has machine code as target Operating systems must manage process state Creating / fighting malware x86 assembly is the language of choice!

– 33 – Great Reality #3: Memory Matters Random Access Memory Is an Unphysical Abstraction Memory is not unbounded It must be allocated and managed Many applications are memory dominated Memory referencing bugs especially pernicious Effects are distant in both time and space Memory performance is not uniform Cache and virtual memory effects can greatly affect program performance Adapting program to characteristics of memory system can lead to major speed improvements

– 34 – Great Reality #4: There’s more to performance than asymptotic complexity Constant factors matter too! And even exact op count does not predict performance Easily see 10:1 performance range depending on how code written Must optimize at multiple levels: algorithm, data representations, procedures, and loops Must understand system to optimize performance How programs compiled and executed How to measure program performance and identify bottlenecks How to improve performance without destroying code modularity and generality

– 35 – Great Reality #5: Computers do more than execute programs They need to get data in and out I/O system critical to program reliability and performance They communicate with each other over networks Many system-level issues arise in presence of network Concurrent operations by autonomous processes Coping with unreliable media Cross platform compatibility Complex performance issues

– 36 – What waits for you in the future?

– 37 – New tricks to learn..

– 38 – Coming to a classroom next to you in Spring semester: 334- Int. to Operating systems Processes and threads Synchronization, semaphores, monitors Assignment 1: Synchronization CPU scheduling policies Assignment 2: System calls and scheduling Virtual memory, paging, and TLBs Assignment 3: Virtual memory Filesystems, FFS, and LFS Assignment 4: File systems RAID and NFS filesystems