ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20.

ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20

2 09b Virtual Memory System Every page of Physical Memory is stored on the disk(s). Part of the Main Memory (RAM) is dedicated to acting as a cache for active pages (a fraction of all physical pages). Programs access instructions and data based on "Virtual Addresses". If the page size is 4096 bits, the rightmost 12 bits are the "Byte-Offset." The Physical Address is the Physical Page address || Byte Offset. The Virtual Address is the Virtual Page address || Byte Offset.

Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly by CPU hardware and the operating system (OS) Programs share main memory – Each gets a private virtual address space (in physical memory) holding its frequently used code and data – Protected from other programs (Physical address (Page No.) includes process ID bits) CPU and OS translate virtual addresses to physical addresses – VM “block” is called a page – VM page “miss” (not in DRAM) is called a "page fault" 3

IndexVTag (Physical MSBs)Data (32 bytes) 000N 001N 010Y 11010010 011 010 Mem[11010] 011N 100N 101N 110Y 00011001 110 110 Mem[10110] 111N Binary Virtual addrHit/missCache block 10000 11 010 xxxx?000 11101 10 110 xxxx?011 10000 11 010 xxxx?000 09a-20 09a-21 Previous State What is new State of Cache? Then This Happens Answer on 09a-21 4 Virtual Page Addr. Physical Page Addr. Page Offset bits 9:4 100011010010011010 111000011001110110 CPU TLB Translation Look-aside Buffer Cache Need to access Page Table yes no

Address Translation Fixed-size pages (e.g., 4K) 41 40 39 --------- "Page Table" on DRAM of Pages on DRAM (some) Pages on disk (all) 5

TLB Operation TLB size typically a function of the target domain – High end machines will have fully associative large TLBs PTE entries are replaced on a demand driven basis The TLB is in the critical path registers ALU Cache Memory TLB virtual address physical address Translate & Update TLB miss 6

Memory Protection Different tasks can share parts of their virtual address spaces – But need to protect against errant access – Requires OS assistance Hardware support for OS protection – Privileged supervisor mode (aka kernel mode) – Privileged instructions – Page tables and other state information only accessible in supervisor mode – System call exception (e.g., syscall in MIPS) Distinguish between a TLB miss*, a data cache miss, and a page fault. * TLB may also contain recently used pages that are not present in cache. 7

09 b Glossary Page Table Page Table Entry (PTE) Page fault Physical address Physical page Translation lookaside buffer (TLB) Virtual address Virtual page 8

Input/Output "I/O" I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate: bytes/sec, transfers/sec I/O bus connections Interrupt (signal) sent to OS when requested data input is ready for retrieval by a process (or thread) that is "blocked" (halted). OS then puts the process on the list of "Ready to Run" processes. 9

Typical x86 PC I/O System Network Interface GPU Software interaction/control Interconnect Replaced with Quickpath Interconnect (QPI) Note the flow of data (and control) in this system! Modern Disk Drives contain internal SRAM buffers to reduce latency 10

Disk Performance Actuator moves the correct read/write head over the correct sector (seek-time – maximum when it has to move from inner cylinder to outer) – Under the control of the disk controller Disk latency = controller overhead + seek time + rotational delay + transfer delay – Seek time and rotational delay are limited by mechanical parts Actuator Arm Head Platters Redundant Array of Inexpensive (Independent) Disks  Use multiple smaller disks (c.f. one large disk)  Parallelism improves performance  Plus extra disk(s) for redundant data storage Provides fault tolerant storage system  Especially if failed disks can be “hot swapped" RAID Transfer Rate = (Bytes per Cylinder) * RPM / ( 60 sec per min) Transfer Delay = Bytes per sector / Tran. Rate 11

Disk Dependability Measures Reliability: mean time to failure (MTTF) Service interruption: mean time to repair (MTTR) Mean time between failures – MTBF = MTTF + MTTR Availability = MTTF / (MTTF + MTTR) Improving Availability – Increase MTTF: fault avoidance, fault tolerance, fault forecasting – Reduce MTTR: improved tools and processes for diagnosis and repair 12

Bus Types, Signals, and Synchronization Data lines – Carry address and data – Multiplexed or separate Control lines – Indicate data type, synchronize transactions Synchronous – Uses a bus clock Asynchronous – Uses request/acknowledge control lines for handshaking Processor-Memory buses – Short, high speed – Design is matched to memory organization I/O buses – Longer, allowing multiple connections – Specified by standards for interoperability – Connect to processor-memory bus through a bridge 13

10 Study Guide Provide a step-by-step example of how each of the following work – Polling, DMA, interrupts, read/write accesses in a RAID configuration, memory mapped I/O Compute the bandwidth for data transfers to/from a disk How is the I/O system of a desktop or laptop different from that of a server? 14

Energy Delay Energy or delay V DD EDP Energy Delay Product (EDP) Delay decreases with supply voltage but energy & power increases  Lowest Energy per Operation Historically, performance scaling was accompanied by scaling down feature sizes. This is no longer true. We have reached a point where power densities are increasing. 15

Processor Power States Performance States – P-states – Operate at different voltage/frequencies Recall delay-voltage relationship – Lower voltage  lower leakage, but slower operation – Lower frequency  lower power (same or more energy per operation) – Lower frequency  longer execution time Idle States - C-states – Sleep states – Which is better: Difference is how much state is saved SW or HW managed transitions between states! Core Cache Core Cache Core Cache Core Cache Core Cache 4X #cores 0.75x voltage 0.5x Frequency 1X power 2X in performance Example Concurrency + lower frequency  greater energy efficiency 16

Thermal Design Power (TDP) This is the maximum power at which the part is designed to operate – Dictates the design of the cooling system Max temperature  T jmax – Typically fixed by worst case workload Parts are typically operating below the TDP Opportunities for turbo mode (higher clock for short time)? AMD Trinity APU http://ecs.vancouver.wsu.edu/thermofluids-research 17

Power and Architecture Activity For example, At n th clock cycle, collected counters are: – Data cache: read = 20, write = 12; per-read energy = 0.5nJ; per-write energy = 0.6nJ; Read energy = read*per-read energy = 10nJ Write energy = write*per-write energy = 7.2nJ Total activity energy = read+write energies = 17.2nJ If n = 50 th clock cycle and clock frequency = 2GHz, Total activity power = energy*clock_freq/n = 688mW *Note: n/clock_freq = n clock periods in sec power = time average of energy 18

Instruction Level Parallelism (ILP) IFIDMEMWB Single (program) thread of execution Issue multiple instructions from the same instruction stream Average CPI<1 Often called out of order (OOO) cores Multiple instructions in EX at the same time 19

Thread Level Parallelism (TLP) Multiple threads of execution Exploit ILP in each thread Exploit concurrent execution across threads 20

Programming Model: Message Passing Each processor has private physical address space Hardware sends/receives messages between processors 21

Graphics Processing Unit - GPU Early video cards – Frame buffer memory with address generation for video output 3D graphics processing – Originally high-end computers (e.g., SGI) – Moore ’ s Law  lower cost, higher density – 3D graphics cards now for PCs and game consoles Graphics Processing Units – Processors oriented to 3D graphics tasks – Vertex/pixel processing, shading, texture mapping, rasterization Processing is highly data-parallel – GPUs are highly multithreaded – Use thread switching to hide memory latency Less reliance on multi-level caches – Graphics memory is wide and high-bandwidth Trend toward general purpose GPUs – Heterogeneous CPU/GPU systems – CPU for sequential code, GPU for parallel code 22

ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20.

Similar presentations

Presentation on theme: "ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20.

Similar presentations

Presentation on theme: "ECE-3056-B Exam Topic Areas John Copeland Friday May 2, 2014 11:30-2:20."— Presentation transcript:

Similar presentations

About project

Feedback