COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson.

COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson

About This Chapter This chapter delves deeper into the computer to give an understanding of the issues regarding the CPU, RAM, and I/O. Having an understanding of the underlying architecture helps with writing efficient software.

Part 1 of 3: CPU Execution Pipelining and Beyond

Execution Pipelining Fetch Instruction Decode Instruction Load Operand Values Execute Operation Store Results Fetch Decode Load Execute Store Fetch Decode Load Execute Store Fetch Decode Load Execute Store Fetch Decode Load Execute Store Old systems: 1 instruction at a time Less-old systems: multiple independent instructions at a time

Instruction- Fetching Circuitry Instruction Decoder(s) Operand- Loading Circuitry Execution Unit(s) Results- Storing Circuitry Instruction 1 A Hardware View of Pipelining Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 7 Problem: What if Instruction 1 stores result in eax (e.g. “mov eax,1”) and Instruction 2 needs to load eax? (e.g. “add ebx,eax”)

Instruction- Fetching Circuitry Instruction Decoder(s) Operand- Loading Circuitry Execution Unit(s) Results- Storing Circuitry Instruction 1 Pipeline Dependencies Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 7 Suppose Instruction 1 stores result in eax and Instruction 2 needs to load eax.  Have to wait here until result stored. Problem: What about conditional jumps?

Branch Prediction Suppose that Instruction 3 is a conditional jump (e.g. jc MyLabel) The “operand” to load is the flags. Its execution is to determine whether or not to jump (i.e. where to go next). Its result is stored in the instruction pointer, eip. Unknown what comes next until the execution, so the CPU makes a prediction first and checks it in the execution stage

Instruction- Fetching Circuitry Instruction Decoder(s) Operand- Loading Circuitry Execution Unit(s) Results- Storing Circuitry Instruction 1 Branch Prediction and the Pipeline Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 4’ Suppose Instruction 3 is a conditional jump oh no! It turned out that the CPU guessed wrong. so clear the pipeline and start from the new eip Instruction 2 changed the flags, so wait here

Pipelining Pros/Cons Pro: Only one set of each hardware component is needed (plus some hardware to manage) Pro: Initial concept was simple Con: Programmer/compiler must try to eliminate dependencies, which can be tough, else face big performance penalties Con: The actual hardware can get complicated Note: No longer short on CPU die space, so first Pro doesn’t matter much anymore

Beyond Pipelining For jumps that are hard to predict, guess BOTH directions, and keep two copies of results based on the guess (e.g. 2 of each register) Allow many instructions in at once (e.g. multiple decoders, multiple execution units, etc.) so that there’s a higher probability of more operations that can run concurrently Vector instructions (multiple data together)

Intel Core i7 Execution Architecture 32KB Instruction Cache 16-byte Prefetch Buffer Initial (Length) Decoder Branch Prediction Queue of ≤18 Instructions 4 Decoders 2 Copies of Registers Buffer of ≤128 MicroOps Store to Memory Load from Memory Several, 128-bit Execution Units 32KB Data Cache L2 Cache L3 Cache from RAM split instructions into parts called “MicroOps”

What About Multiple Cores? What we’ve looked at so far is a single CPU core’s execution. A CPU core is a copy of this functionality on the CPU die, so a quad-core CPU has 4 copies of everything shown (except larger caches). Instead of trying to run multiple instructions from the same stream of code concurrently, as before, each core runs independently of any others (one thread on each)

Confusion About Cores “Cores” in GPUs and custom processors like the Cell are not independent, whereas cores in standard CPUs are, so this has led to great confusion and misunderstanding. The operating system decides what instruction stream (thread) to run on each CPU core, and can periodically change this (thread scheduling) These issues are not part of this course, but may be covered in a parallel computing or operating systems course.

Part 2 of 3: Memory Caches and Virtual Memory

Memory Caches Caches are copies of RAM on the CPU to save time A cache miss is when one checks a cache for a piece of memory that is not there Larger caches have fewer misses, but are slower, so modern CPUs have multiple levels of cache: – Memory Buffers (ignored here), L1 Cache, L2 Cache, L3 Cache, RAM CPU only accesses memory through cache under normal circumstances

Reading From Cache want value of memory at location A if A is not in L1 if A is not in L2 if A is not in L3 L3 reads A from RAM L2 reads A from L3 L1 reads A from L2 read A from L1 Note: A is now in all levels of cache

Writing to Cache want to store value into memory at location A write A into L1 after time delay, L1 writes A into L2 after time delay, L2 writes A into L3 after time delay, L3 writes A into RAM Note: the time delays could result in concurrency issues in multi-core CPUs, so write caching can get more complicated

Caching Concerns Randomly accessing memory causes many more cache misses than sequentially accessing memory or accessing relatively few locations – This is how quicksort is usually not so quick compared to mergesort Writing to a huge block of memory that won’t be read soon can cause cache misses later, since it fills up caches with the written data – There are special instructions to indicate not to cache certain writes, avoiding this in assembly

Paging Paging, a.k.a. virtual memory mapping, is a feature of CPUs that allows the apparent rearrangement of physical memory blocks into one or more virtual memory spaces. 3 main reasons for this: – Programs can be in separate memory spaces, so they don’t interfere with each other – The OS can give the illusion of more memory using the hard drive – The OS can prevent programs from messing up the system (accidentally or intentionally)

Virtual Memory With a modern OS, no memory accesses by a program directly access physical memory Virtual addresses are mapped to physical addresses in 4KB or 2MB pages using page tables, set up by the OS.

Page Tables page table for Dude.exe: physical memory: virtual page #:012 6... 34 page table for Sweet.exe: virtual page #:012... 34 physical page #:0123456789ABCDEF... 57 657

Part 3 of 3: I/O and Interrupts Just an Overview

Common I/O Devices Human Interface (what most people think of) – Keyboard, Mouse, Microphone, Speaker, Display, Webcam, etc. Storage – Hard Drive, Optical Drive, USB Key, SD Card Adapters – Network Card, Graphics Card Timers (very important for software) – PITs, LAPIC Timers, CMOS Timer

If There’s One Thing to Remember I/O IS SLOW! Bad Throughput: – Mechanical drives can transfer up to 127MB/s – Memory bus can transfer up to 30,517 MB/s (or more for modern ones) Very Bad Latency: – 10,000 RPM drive average latency: 3,000,000ns – 1333MHz uncached memory average latency: 16ns

I/O Terminology I/O Ports or Memory-Mapped I/O? – Some devices are controlled through special “I/O ports” accessible with the “in” and “out” instructions. – Some devices make themselves controllable by occupying blocks of memory and intercepting any reads or writes to that memory instead of using “in” and “out”. This is often called Direct Memory Access (DMA).

I/O Terminology Programmed I/O or Interrupt-Driven I/O? – Programmed I/O is controlling a device’s “operation” step-by-step with the CPU – Interrupt-Driven I/O involves the CPU setting up some “operation” to be done by a device and getting “notified” by the device when the “operation” is done – Most I/O in a modern system is interrupt-driven

Interrupts Instead of continuously checking for keyboard or mouse input, can be notified of it when it happens Instead of waiting idly for the hard drive to finish writing data, can do other work and be notified when it’s done Such a notification is called an I/O interrupt. (There are also exception interrupts e.g. for when doing an integer division by zero.)

Interrupts When an interrupt occurs, the CPU stops what it was doing and calls a function specified by the OS to handle the interrupt. – This function is an interrupt handler The interrupt handler deals with the I/O operation (e.g. saves a typed key) and returns, resuming whatever was interrupted Because interrupts can occur at any time, values on the stack below esp may change at any time

COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson.

Similar presentations

Presentation on theme: "COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson.

Similar presentations

Presentation on theme: "COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson."— Presentation transcript:

Similar presentations

About project

Feedback