Computer Architecture Recitation 1

Slides:



Advertisements
Similar presentations
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Advertisements

Computer Architecture lecture 1 、 2 学习报告 (第二次) 亢吉男
Understanding a Problem in Multicore and How to Solve It
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Physics 413 Chapter 1 Computer Architecture What is a Digital Computer ? A computer is essentially a fast electronic calculating machine. What is a program.
CBSSS 2002: DeHon Architecture as Interface André DeHon Friday, June 21, 2002.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
ISA's, Compilers, and Assembly
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CS203 – Advanced Computer Architecture Main Memory Slides adapted from Onur Mutlu (CMU)
15-740/ Computer Architecture Lecture 2: ISA, Tradeoffs, Performance Prof. Onur Mutlu Carnegie Mellon University.
CS161 – Design and Architecture of Computer Main Memory Slides adapted from Onur Mutlu (CMU)
Introduction to Operating Systems Concepts
Computer Architecture Lecture 12: Virtual Memory I
Computer Organization and Architecture Lecture 1 : Introduction
CIS-550 Computer Architecture Lecture 1: Introduction and Basics
15-740/ Computer Architecture Lecture 0: Announcements/Logistics
Computer Organization and Machine Language Programming CPTG 245
Computer Architecture Lecture 2: Fundamental Concepts and ISA
Computer Architecture Lecture 1: Introduction and Basics
Computer Architecture Lecture 3: ISA Tradeoffs
ISA - Instruction Set Architecture
Assembly language.
15-740/ Computer Architecture Lecture 3: Performance
Lecture 3: MIPS Instruction Set
A Closer Look at Instruction Set Architectures
15-740/ Computer Architecture Lecture 4: ISA Tradeoffs
Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture Lecture 4: More ISA Tradeoffs
CS161 – Design and Architecture of Computer Systems
Computer Architecture Lecture 1: Introduction and Basics
Architecture & Organization 1
Overview Introduction General Register Organization Stack Organization
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/14/2011
Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/14/2013
Lecture 4: MIPS Instruction Set
Architecture & Organization 1
CSCI206 - Computer Organization & Programming
Operating Systems Chapter 5: Input/Output Management
Chapter 5: Computer Systems Organization
Computer Architecture
Computer Instructions
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
What is Computer Architecture?
Introduction to Microprocessor Programming
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
15-740/ Computer Architecture Lecture 19: Main Memory
What is Computer Architecture?
What is Computer Architecture?
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Lecture 7 System architecture Input-output
CPU Structure CPU must:
Lecture 4: Instruction Set Design/Pipelining
Lecture 12 Input/Output (programmer view)
CSE378 Introduction to Machine Organization
Computer Architecture Lecture 30: In-memory Processing
Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 1/25/2012
Chapter 4 The Von Neumann Model
Presentation transcript:

18-447 Computer Architecture Recitation 1 Kevin Chang Carnegie Mellon University Spring 2015, 1/23/2015

Agenda for Today Quick recap on the previous lectures Practice questions Q&A on HW1, lab1, and lecture materials Important deadlines: Lab 1 due tonight at 11:59:59 PM. Handin through AFS. Wednesday (1/28): HW 1 due We start with where we left off last week on ISA tradeoffs and we will finish it today.

Quick Review DRAM-based memory system Cells, banks, refresh, performance hog, row hammer Modern main memory is predominantly built with DRAM cells, which stores data in capacitors

DRAM in the System DRAM BANKS SHARED L3 CACHE DRAM INTERFACE CORE 0 Multi-Core Chip CORE 0 L2 CACHE 0 L2 CACHE 1 CORE 1 SHARED L3 CACHE DRAM INTERFACE DRAM MEMORY CONTROLLER DRAM BANKS CORE 2 L2 CACHE 2 L2 CACHE 3 CORE 3 *Die photo credit: AMD Barcelona

DRAM in the System: Refresh Bank 3 Capacitor Access transistor Bitline Wordline Bank 2 MEMORY CONTROLLER Refresh Memory Bus Modern DRAM system is subdivided into multiple DRAM banks, where a bank is A two-dimensional array of capacitor-based DRAM cells, organized in rows and columns, along with some other peripherals. The reason for having multiple banks is that DRAM can serve requests in parallel across individual banks independently Talk about DRAM cells. Sense amps that sense the charge in the cell and converts the charge to a digital value of either 1 or 0. A row of sense amplifiers is also referred as a row buffer. One major issue with using DRAM cells is that they leak charge over time. The minimum amount of time that a cell can retain enough charge is called the retention time. To prevent data loss, the memory controller periodically sends a refresh command to DRAM to trigger a refresh operation. Each refresh operation can refresh at least one row, up to 8 rows. For simplicity, we will assume that a refresh only works on one row at a time. I’ve omitted some of the details in terms of retention time and refresh intervals. You can go back to Bank 0 Downsides of refresh: 1. Energy consumption 2. Performance degradation 3. QoS/predictability impact 4. Refresh rate limits DRAM capacity scaling

DRAM in the System: Performance Hog Bank 3 matlab Bank 2 MEMORY CONTROLLER Memory Bus Bank 1 gcc -In a multi-core chip, different cores share some hardware resources. In particular, they share the DRAM memory system. When we run matlab on one core, and gcc on another core, both cores generate memory requests to access the DRAM banks. When these requests arrive at the DRAM controller, the controller favors matlab’s requests over gcc’s requests. As a result, matlab can make progress and continues generating memory requests. These requests are again favored by the DRAM controller over gcc’s requests. Therefore, gcc starves waiting for its requests to be serviced in DRAM whereas matlab makes very quick progress as if it were running alone. Why does this happen? This is because the algorithms employed by the DRAM controller are unfair. But, why are these algorithms unfair? Why do they unfairly prioritize matlab accesses? To understand this, we need to understand how a DRAM bank operates. Bank 0 Memory performance hog: Applications are being unfairly slowed down b/c DRAM controller is designed to maximize throughput

Unexpected Slowdowns in Multi-Core Unfairly slowed down What kind of performance do we expect when we run two applications on a multi-core system? To answer this question, we performed an experiment. We took two applications we cared about, ran them together on different cores in a dual-core system, and measured their slowdown compared to when each is run alone on the same system. This graph shows the slowdown each app experienced. (DATA explanation…) Why do we get such a large disparity in the slowdowns? Is it the priorities? No. We went back and gave high priority to gcc and low priority to matlab. The slowdowns did not change at all. Neither the software or the hardware enforced the priorities. Is it the contention in the disk? We checked for this possibility, but found that these applications did not have any disk accesses in the steady state. They both fit in the physical memory and therefore did not interfere in the disk. What is it then? Why do we get such large disparity in slowdowns in a dual core system? I will call such an application a “memory performance hog” Now, let me tell you why this disparity in slowdowns happens. Is it that there are other applications or the OS interfering with gcc, stealing its time quantums? No. (Core 0) (Core 1) Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,” USENIX Security 2007.

Disturbance Errors in Modern DRAM Row of Cells Wordline Row Victim Row Aggressor Row Row Opened Closed VHIGH VLOW Victim Row Row Row Repeatedly opening and closing a row enough times within a refresh interval induces disturbance errors in adjacent rows in most real DRAM chips you can buy today Kim+, “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors,” ISCA 2014.

Quick Review DRAM-based memory system Key components of a computer Cells, banks, refresh, row hammer, performance hog Key components of a computer The von Neumann vs. dataflow model ISA vs. microarchitecture Elements of an ISA Instructions: opcodes, data types, registers, formats, etc Memory: address space, addressing modes, alignment, etc ISA tradeoffs CISC vs. RISC Semantic gap Von Neumann model: An instruction is fetched and executed in control flow order Stored program and sequentially process instructions What major machines use today Dataflow model: An instruction is fetched and executed in data flow order without a pc * The program is executed based on the inputs feeding into dataflow nodes which perform the computation A data flow node fires (fetched and executed) when all it inputs are ready ISA: Specifies how the programmer sees instructions to be executed  Microarchitecture: How the underlying implementation actually executes instructions * Microarchitecture can execute instructions in any order as long as it obeys the semantics specified by the ISA when making the instruction results visible to software Programmer should see the order specified by the ISA Implementation (uarch) can be various as long as it satisfies the specification (ISA) Data types: Simple – int Complex – linked list, string, bit vectors Memory organization: byte addressable? How big is the address space? ---- CISC: does a lot of work, such as inserting a node to a linked list RISC: does little and primitive work, such add or xor Semantic gap: where to place your isa: closer to HLL or HW control signals. Tradeoffs b/w compilers and HW. Small gap: rep movs

Practice Questions

Practice Question 1: Dataflow

Practice Question 2: MIPS ISA int foo(int *A, int n) { int s; if (n>=2) { s=foo(A, n-1); s=s+A[n-2]; } else { s=1; A[n]=s+1; return A[n]; _foo: // TODO _branch: _true: _false: _join: _done: MIPS Assembly 1. A and n are passed in to r4 and r5 2. Result should be returned in r2, and r31 stores the return address 3. r29 (stack ptr), r8-r15 (caller saved), r16-r23 (called saved)

Q & A