ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20.

Slides:



Advertisements
Similar presentations
1 Lecture 22: I/O, Disk Systems Todays topics: I/O overview Disk basics RAID Reminder: Assignment 8 due Tue 11/21.
Advertisements

I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
CS 61C: Great Ideas in Computer Architecture
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Memory/Storage Architecture Lab Computer Architecture Lecture Storage and Other I/O Topics.
Storage & Peripherals Disks, Networks, and Other Devices.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
CS 153 Design of Operating Systems Spring 2015 Final Review.
I/O 1 Computer Organization II © McQuain Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner:
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.
Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
Lecture 35: Chapter 6 Today’s topic –I/O Overview 1.
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day14:
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Computer Organization CS224 Fall 2012 Lessons 47 & 48.
Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
1 CSE451 Architectural Supports for Operating Systems Autumn 2002 Gary Kimura Lecture #2 October 2, 2002.
Introduction I/O devices can be characterized by – Behaviour: input, output, storage – Partner: human or machine – Data rate: bytes/sec, transfers/sec.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
LECTURE 13 I/O. I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access.
I/O Lecture notes from MKP and S. Yalamanchili.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
CS352H: Computer Systems Architecture
From Address Translation to Demand Paging
Morgan Kaufmann Publishers Storage and Other I/O Topics
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Computer Architecture
Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.
Introduction I/O devices can be characterized by I/O bus connections
Lecture 13 I/O.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
Module 2: Computer-System Structures
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
Translation Buffers (TLB’s)
Module 2: Computer-System Structures
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Module 2: Computer-System Structures
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

ECE-3056-B Exam Topic Areas John Copeland Friday May 2, :30-2:20

2 09b Virtual Memory System Every page of Physical Memory is stored on the disk(s). Part of the Main Memory (RAM) is dedicated to acting as a cache for active pages (a fraction of all physical pages). Programs access instructions and data based on "Virtual Addresses". If the page size is 4096 bits, the rightmost 12 bits are the "Byte-Offset." The Physical Address is the Physical Page address || Byte Offset. The Virtual Address is the Virtual Page address || Byte Offset.

Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly by CPU hardware and the operating system (OS) Programs share main memory – Each gets a private virtual address space (in physical memory) holding its frequently used code and data – Protected from other programs (Physical address (Page No.) includes process ID bits) CPU and OS translate virtual addresses to physical addresses – VM “block” is called a page – VM page “miss” (not in DRAM) is called a "page fault" 3

IndexVTag (Physical MSBs)Data (32 bytes) 000N 001N 010Y Mem[11010] 011N 100N 101N 110Y Mem[10110] 111N Binary Virtual addrHit/missCache block xxxx? xxxx? xxxx?000 09a-20 09a-21 Previous State What is new State of Cache? Then This Happens Answer on 09a-21 4 Virtual Page Addr. Physical Page Addr. Page Offset bits 9: CPU TLB Translation Look-aside Buffer Cache Need to access Page Table yes no

Address Translation Fixed-size pages (e.g., 4K) "Page Table" on DRAM of Pages on DRAM (some) Pages on disk (all) 5

TLB Operation TLB size typically a function of the target domain – High end machines will have fully associative large TLBs PTE entries are replaced on a demand driven basis The TLB is in the critical path registers ALU Cache Memory TLB virtual address physical address Translate & Update TLB miss 6

Memory Protection Different tasks can share parts of their virtual address spaces – But need to protect against errant access – Requires OS assistance Hardware support for OS protection – Privileged supervisor mode (aka kernel mode) – Privileged instructions – Page tables and other state information only accessible in supervisor mode – System call exception (e.g., syscall in MIPS) Distinguish between a TLB miss*, a data cache miss, and a page fault. * TLB may also contain recently used pages that are not present in cache. 7

09 b Glossary Page Table Page Table Entry (PTE) Page fault Physical address Physical page Translation lookaside buffer (TLB) Virtual address Virtual page 8

Input/Output "I/O" I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate: bytes/sec, transfers/sec I/O bus connections Interrupt (signal) sent to OS when requested data input is ready for retrieval by a process (or thread) that is "blocked" (halted). OS then puts the process on the list of "Ready to Run" processes. 9

Typical x86 PC I/O System Network Interface GPU Software interaction/control Interconnect Replaced with Quickpath Interconnect (QPI) Note the flow of data (and control) in this system! Modern Disk Drives contain internal SRAM buffers to reduce latency 10

Disk Performance Actuator moves the correct read/write head over the correct sector (seek-time – maximum when it has to move from inner cylinder to outer) – Under the control of the disk controller Disk latency = controller overhead + seek time + rotational delay + transfer delay – Seek time and rotational delay are limited by mechanical parts Actuator Arm Head Platters Redundant Array of Inexpensive (Independent) Disks  Use multiple smaller disks (c.f. one large disk)  Parallelism improves performance  Plus extra disk(s) for redundant data storage Provides fault tolerant storage system  Especially if failed disks can be “hot swapped" RAID Transfer Rate = (Bytes per Cylinder) * RPM / ( 60 sec per min) Transfer Delay = Bytes per sector / Tran. Rate 11

Disk Dependability Measures Reliability: mean time to failure (MTTF) Service interruption: mean time to repair (MTTR) Mean time between failures – MTBF = MTTF + MTTR Availability = MTTF / (MTTF + MTTR) Improving Availability – Increase MTTF: fault avoidance, fault tolerance, fault forecasting – Reduce MTTR: improved tools and processes for diagnosis and repair 12

Bus Types, Signals, and Synchronization Data lines – Carry address and data – Multiplexed or separate Control lines – Indicate data type, synchronize transactions Synchronous – Uses a bus clock Asynchronous – Uses request/acknowledge control lines for handshaking Processor-Memory buses – Short, high speed – Design is matched to memory organization I/O buses – Longer, allowing multiple connections – Specified by standards for interoperability – Connect to processor-memory bus through a bridge 13

10 Study Guide Provide a step-by-step example of how each of the following work – Polling, DMA, interrupts, read/write accesses in a RAID configuration, memory mapped I/O Compute the bandwidth for data transfers to/from a disk How is the I/O system of a desktop or laptop different from that of a server? 14

Energy Delay Energy or delay V DD EDP Energy Delay Product (EDP) Delay decreases with supply voltage but energy & power increases  Lowest Energy per Operation Historically, performance scaling was accompanied by scaling down feature sizes. This is no longer true. We have reached a point where power densities are increasing. 15

Processor Power States Performance States – P-states – Operate at different voltage/frequencies Recall delay-voltage relationship – Lower voltage  lower leakage, but slower operation – Lower frequency  lower power (same or more energy per operation) – Lower frequency  longer execution time Idle States - C-states – Sleep states – Which is better: Difference is how much state is saved SW or HW managed transitions between states! Core Cache Core Cache Core Cache Core Cache Core Cache 4X #cores 0.75x voltage 0.5x Frequency 1X power 2X in performance Example Concurrency + lower frequency  greater energy efficiency 16

Thermal Design Power (TDP) This is the maximum power at which the part is designed to operate – Dictates the design of the cooling system Max temperature  T jmax – Typically fixed by worst case workload Parts are typically operating below the TDP Opportunities for turbo mode (higher clock for short time)? AMD Trinity APU 17

Power and Architecture Activity For example, At n th clock cycle, collected counters are: – Data cache: read = 20, write = 12; per-read energy = 0.5nJ; per-write energy = 0.6nJ; Read energy = read*per-read energy = 10nJ Write energy = write*per-write energy = 7.2nJ Total activity energy = read+write energies = 17.2nJ If n = 50 th clock cycle and clock frequency = 2GHz, Total activity power = energy*clock_freq/n = 688mW *Note: n/clock_freq = n clock periods in sec power = time average of energy 18

Instruction Level Parallelism (ILP) IFIDMEMWB Single (program) thread of execution Issue multiple instructions from the same instruction stream Average CPI<1 Often called out of order (OOO) cores Multiple instructions in EX at the same time 19

Thread Level Parallelism (TLP) Multiple threads of execution Exploit ILP in each thread Exploit concurrent execution across threads 20

Programming Model: Message Passing Each processor has private physical address space Hardware sends/receives messages between processors 21

Graphics Processing Unit - GPU Early video cards – Frame buffer memory with address generation for video output 3D graphics processing – Originally high-end computers (e.g., SGI) – Moore ’ s Law  lower cost, higher density – 3D graphics cards now for PCs and game consoles Graphics Processing Units – Processors oriented to 3D graphics tasks – Vertex/pixel processing, shading, texture mapping, rasterization Processing is highly data-parallel – GPUs are highly multithreaded – Use thread switching to hide memory latency Less reliance on multi-level caches – Graphics memory is wide and high-bandwidth Trend toward general purpose GPUs – Heterogeneous CPU/GPU systems – CPU for sequential code, GPU for parallel code 22