Computer Architecture lecture 1 、 2 学习报告 (第二次) 亢吉男 2015.4.11.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Computer Organization and Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Computer Organization and Architecture
Understanding a Problem in Multicore and How to Solve It
Computer Architecture Lecture 2: Fundamental Concepts and ISA Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/16/2013.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Computer System Overview OS-1 Course AA
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
Computer Organization and Assembly language
5.1 Chaper 4 Central Processing Unit Foundations of Computer Science  Cengage Learning.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Invitation to Computer Science 5th Edition
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Microprocessor Dr. Rabie A. Ramadan Al-Azhar University Lecture 2.
Computer Organization and Design Computer Abstractions and Technology
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Architecture And Organization UNIT-II General System Architecture.
Computer Architecture Lecture 2: Fundamental Concepts and ISA
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Computer Architecture 2 nd year (computer and Information Sc.)
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Overview von Neumann Architecture Computer component Computer function
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Computer Architecture Lecture 2: Fundamental Concepts and ISA Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/14/2014.
15-740/ Computer Architecture Lecture 2: ISA, Tradeoffs, Performance Prof. Onur Mutlu Carnegie Mellon University.
Computer Architecture Lecture 2: Fundamental Concepts and ISA Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/15/2014.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
PipeliningPipelining Computer Architecture (Fall 2006)
Computer Architecture Furkan Rabee
William Stallings Computer Organization and Architecture 6th Edition
Computer Organization and Architecture Lecture 1 : Introduction
Computer Architecture Lecture 2: Fundamental Concepts and ISA
15-740/ Computer Architecture Lecture 3: Performance
CS161 – Design and Architecture of Computer Systems
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Architecture & Organization 1
Computer Architecture and Organization
Computer Architecture Recitation 1
Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/14/2013
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Architecture & Organization 1
Computer Architecture
What is Computer Architecture?
Introduction to Microprocessor Programming
ECE 352 Digital System Fundamentals
COMS 361 Computer Organization
What is Computer Architecture?
What is Computer Architecture?
Computer Architecture Lecture 3a: Some Fundamentals
CSE378 Introduction to Machine Organization
RAIDR: Retention-Aware Intelligent DRAM Refresh
Presentation transcript:

Computer Architecture lecture 1 、 2 学习报告 (第二次) 亢吉男

The science and art of designing, selecting, and interconnecting hardware components and designing the hardware/software interface to create a computing system that meets functional, performance, energy consumption, cost, and other specific goals. 1 、 What is Computer Architecture? 2 、 our purpose Enable better systems: make computers faster, cheaper, smaller, more reliable, … Enable new applications Enable better solutions to problems : Software innovation is built into trends and changes in computer architecture, > 50% performance improvement per year has enabled this innovation

3 、 problems are solved by electrons Microarchitecture ISA (Architecture) Program/Language Algorithm Problem Logic Circuits Runtime System (VM, OS, MM) Electrons ISA (instruction set architecture) is the interface between hardware and software, It is a contract that the hardware promises to satisfy. Microarchitecture : Specific implementation of an ISA; Not visible to the software. Microprocessor : ISA, Microarchitecture, circuits “Architecture” = ISA + microarchitecture Implementation (uarch) can be various as long as it satisfies the specification (ISA), Microarchitecture usually changes faster than ISA. Few ISAs (x86, ARM, SPARC, MIPS, Alpha) but many uarchs.

Also called stored program computer (instructions in memory). Two key properties: ( 1 ) Stored program Instructions stored in a linear memory array ; Memory is unified between instructions and data ; The interpretation of a stored value depends on the control signals ; ( 2 ) Sequential instruction processing One instruction processed (fetched, executed, and completed) at a time ; Program counter (instruction pointer) identifies the current instr ; Program counter is advanced sequentially except for control transfer instructions ; 4 、 The Von Neumann Model/Architecture

CONTROL UNIT IPInst Register PROCESSING UNIT ALU TEMP MEMORY Mem Addr Reg Mem Data Reg INPUTOUTPUT The Von Neumann Model (of a Computer)

Dataflow model: An instruction is fetched and executed in data flow order. properties: when its operands are ready; there is no instruction pointer; Instruction ordering specified by data flow dependence; Each instruction specifies “who” should receive the result; An instruction can “fire” whenever all operands are received; Potentially many instructions can execute at the same time; In a data flow machine, a program consists of data flow nodes. A data flow node fires (fetched and executed) when all it inputs are ready. Von Neumann model: An instruction is fetched and executed in control flow order. As specified by the instruction pointer; Sequential unless explicit control flow instruction; 5 、 The Dataflow Model (of a Computer )

ISA: Specifies how the programmer sees instructions to be executed Programmer sees a sequential, control-flow execution order Programmer sees a data-flow execution order Microarchitecture: How the underlying implementation actually executes instructions Microarchitecture can execute instructions in any order as long as it obeys the semantics specified by the ISA when making the instruction results visible to software Programmer should see the order specified by the ISA 6 、 The distinctions between ISA and uarch

All major instruction set architectures today use this Von Neumann Model : x86, ARM, MIPS, SPARC, Alpha, POWER Underneath (at the microarchitecture level), the execution model of almost all implementations (or, microarchitectures) is very different: Pipelined instruction execution: Intel uarch Multiple instructions at a time: Intel Pentium uarch Out-of-order execution: Intel Pentium Pro uarch Separate instruction and data caches But, what happens underneath that is not consistent with the von Neumann model is not exposed to software.

(1)Instructions Opcodes, Addressing Modes, Data Types Instruction Types and Formats Registers, Condition Codes (2)MEMORY Address space, Addressability, Alignment Virtual memory management Call, Interrupt/Exception Handling (3)Access Control, Priority/Privilege (4)I/O: memory-mappedvs. instr. (5)Task/thread Management (6)Power and Thermal Management (7)Multi-threading support, Multiprocessor support ISA

Implementation of the ISA under specific design constraints and goals Anything done in hardware without exposure to software (1)Pipelining (2)In-order versus out-of-order instruction execution (3)Memory access scheduling policy (4)Speculative execution (5)Superscalar processing (multiple instruction issue) (6)Clock gating (7)Caching, Levels, size, associativity, replacement policy (8)Prefetching (9)Voltage/frequency scaling (10)Error correction Microarchitecture

(1)Denial of Memory Service in Multi-Core System (2) DRAM Refresh (3) DRAM Row Hammer (or DRAM Disturbance Errors) 7 、 Three examples in lecture 1 There are three examples (questions) in lecture 1. Because I mainly study about the second example, DRAM Refresh, I will talk more about that and describe others briefly.

In a multi-core chip, different cores share some hardware resources. In particular, they share the DRAM memory system. Multiple applications share the DRAM controller, DRAM controllers designed to maximize DRAM data throughput DRAM scheduling policies are unfair to some applications Row-hit first: unfairly prioritizes apps with high row buffer locality. Threads that keep on accessing the same row. Oldest-first: unfairly prioritizes memory-intensive applications. (1)Denial of Memory Service in Multi-Core System (3) DRAM Row Hammer (or DRAM Disturbance Errors) Repeatedly opening and closing a row enough times within a refresh interval induces disturbance errors in adjacent rows in most real DRAM chips.

A DRAM cell consists of a capacitor and an access transistor. It stores data in terms of charge in the capacitor. DRAM capacitor charge leaks over time. The memory controller needs to refresh each row periodically to restore charge, Typical Activate each row every 64 ms. Downsides of refresh -- Energy consumption: Each refresh consumes energy -- Performance degradation: DRAM rank/bank unavailable while refreshed -- predictability impact: (Long) pause times during refresh -- Refresh rate limits DRAM capacity scaling (2) DRAM Refresh

Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. Therefore, many of these refreshes are unnecessary. Solution: RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times. The key idea is to group DRAM rows into retention time bins and apply a different refresh rate to each bin. As a result, rows containing leaky cells are refreshed as frequently as normal, while most rows are refreshed less frequently. RAIDR requires no modification to DRAM and minimal modification to the memory controller, a modest storage overhead of 1.25 KB in the memory controller.

More details about RAIDR A retention time profiling step determines each row’s retention time ((1) in Figure 4). For each row, if the row’s retention time is less than the new default refresh interval, the memory controller inserts it into the appropriate bin (2). During system operation (3), the memory controller ensures that each row is chosen as a refresh candidate every 64 ms.

Three key components: (1) retention time profiling; (2) storing rows into retention time bins; (3) issuing refreshes to rows when necessary; (1)Retention time profiling The straightforward method of conducting these measurements is to write a small number of static patterns (such as “all 1s” or “all 0s”), turning off refreshes, and observing when the first bit changes. Before the row retention times for a system are collected, the memory controller performs refreshes using the baseline auto- refresh mechanism. After the row retention times for a system have been measured, the results can be saved in a file by the operating system.

(2)Storing Retention Time Bins: Bloom Filters A Bloom filter is a structure that provides a compact way of representing set membership and can be implemented efficiently in hardware. A Bloom filter consists of a bit array of length m and k distinct hash functions that map each element to positions in the array. A Bloom filter can contain any number of elements; the probability of a false positive gradually increases with the number of elements inserted into the Bloom filter, but false negatives will never occur. This means that rows may be refreshed more frequently than necessary, but a row is never refreshed less frequently than necessary.

(3) Performing Refresh Operations Selecting A Refresh Candidate Row : We choose all refresh intervals to be multiples of 64 ms, This is implemented with a row counter that counts through every row address sequentially. Determining Time Since Last Refresh : Determining if a row needs to be refreshed requires determining how many 64 ms intervals have elapsed since its last refresh. Issuing Refreshes : In order to refresh a specific row, the memory controller simply activates that row, essentially performing a RAS- only refresh.

An auto-refresh operation occupies all banks on the rank simultaneously (preventing the rank from servicing any requests) for a length of time tRFC(the average time between auto-refresh commands), where tRFC depends on the number of rows being refreshed. Previous DRAM generations also allowed the memory controller to perform refreshes by opening rows one-by-one (called RAS- only refresh), but this method has been deprecated due to the additional power required to send row addresses on the bus. Auto-refresh

A microprocessor processes instructions, it has to do three things: 1) supply instructions to the core of the processor where each instruction can do its job; 2) supply data needed by each instruction; 3) perform the operations required by each instruction; (1) Instruction Supply The number that can be fetched at one time has grown from one to four, and shows signs of soon growing to six or eight. Three things can get in the way of fully supplying the core with instructions to process: (a)Instruction cache misses,(b)fetch breaks,(c) conditional branch mispredictions. 8 、 Something new in “Requirements, Bottlenecks, and Good Fortune: Agents for Microprocessor Evolution”

(2) Data Supply To supply data needed by an instruction, one needs the ability to have available an infinite supply of needed data, to supply it in zero time, and at reasonable cost. Thebest we can do is a storage hierarchy: where a small amount of data can be accessed (on-chip) in one to three cycles, a lot more data can be accessed (also, on-chip) in ten to 16 cycles, and still more data can be accessed (off chip) in hundreds of cycles. (3) Instruction Processing To perform the operations required by these instructions, one needs a sufficient number of functional units to process the data as soon as the data is available, and sufficient interconnections to instantly supply a result produced by one functional unit to the functional unit that needs it as a source.