COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Computer Architecture
Computer Organization and Architecture
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Chapter 12 Pipelining Strategies Performance Hazards.
1 Computer System Overview OS-1 Course AA
Computer System Overview
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Chapter 12 CPU Structure and Function. Example Register Organizations.
PhD/Master course, Uppsala  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
CH12 CPU Structure and Function
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Input and Output Computer Organization and Assembly Language: Module 9.
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
I/O management is a major component of operating system design and operation Important aspect of computer operation I/O devices vary greatly Various methods.
Fall 2000M.B. Ibáñez Lecture 25 I/O Systems. Fall 2000M.B. Ibáñez Categories of I/O Devices Human readable –used to communicate with the user –video display.
Processor Architecture
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 1: Review of Computer Organization
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Lecture on Central Process Unit (CPU)
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Copyright © 2007 by Curt Hill Interrupts How the system responds.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Structure and Role of a Processor
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Introduction to Operating Systems Concepts
William Stallings Computer Organization and Architecture 8th Edition
Architecture Background
CS703 - Advanced Operating Systems
Computer System Overview
Operating Systems Chapter 5: Input/Output Management
BIC 10503: COMPUTER ARCHITECTURE
Computer Architecture
Chapter 11 Processor Structure and function
Chapter 13: I/O Systems.
Lecture 12 Input/Output (programmer view)
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

COMP 2003: Assembly Language and Digital Logic Chapter 7: Computer Architecture Notes by Neil Dickson

About This Chapter This chapter delves deeper into the computer to give an understanding of the issues regarding the CPU, RAM, and I/O. Having an understanding of the underlying architecture helps with writing efficient software.

Part 1 of 3: CPU Execution Pipelining and Beyond

Execution Pipelining Fetch Instruction Decode Instruction Load Operand Values Execute Operation Store Results Fetch Decode Load Execute Store Fetch Decode Load Execute Store Fetch Decode Load Execute Store Fetch Decode Load Execute Store Old systems: 1 instruction at a time Less-old systems: multiple independent instructions at a time

Instruction- Fetching Circuitry Instruction Decoder(s) Operand- Loading Circuitry Execution Unit(s) Results- Storing Circuitry Instruction 1 A Hardware View of Pipelining Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 7 Problem: What if Instruction 1 stores result in eax (e.g. “mov eax,1”) and Instruction 2 needs to load eax? (e.g. “add ebx,eax”)

Instruction- Fetching Circuitry Instruction Decoder(s) Operand- Loading Circuitry Execution Unit(s) Results- Storing Circuitry Instruction 1 Pipeline Dependencies Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 7 Suppose Instruction 1 stores result in eax and Instruction 2 needs to load eax.  Have to wait here until result stored. Problem: What about conditional jumps?

Branch Prediction Suppose that Instruction 3 is a conditional jump (e.g. jc MyLabel) The “operand” to load is the flags. Its execution is to determine whether or not to jump (i.e. where to go next). Its result is stored in the instruction pointer, eip. Unknown what comes next until the execution, so the CPU makes a prediction first and checks it in the execution stage

Instruction- Fetching Circuitry Instruction Decoder(s) Operand- Loading Circuitry Execution Unit(s) Results- Storing Circuitry Instruction 1 Branch Prediction and the Pipeline Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 4’ Suppose Instruction 3 is a conditional jump oh no! It turned out that the CPU guessed wrong. so clear the pipeline and start from the new eip Instruction 2 changed the flags, so wait here

Pipelining Pros/Cons Pro: Only one set of each hardware component is needed (plus some hardware to manage) Pro: Initial concept was simple Con: Programmer/compiler must try to eliminate dependencies, which can be tough, else face big performance penalties Con: The actual hardware can get complicated Note: No longer short on CPU die space, so first Pro doesn’t matter much anymore

Beyond Pipelining For jumps that are hard to predict, guess BOTH directions, and keep two copies of results based on the guess (e.g. 2 of each register) Allow many instructions in at once (e.g. multiple decoders, multiple execution units, etc.) so that there’s a higher probability of more operations that can run concurrently Vector instructions (multiple data together)

Intel Core i7 Execution Architecture 32KB Instruction Cache 16-byte Prefetch Buffer Initial (Length) Decoder Branch Prediction Queue of ≤18 Instructions 4 Decoders 2 Copies of Registers Buffer of ≤128 MicroOps Store to Memory Load from Memory Several, 128-bit Execution Units 32KB Data Cache L2 Cache L3 Cache from RAM split instructions into parts called “MicroOps”

What About Multiple Cores? What we’ve looked at so far is a single CPU core’s execution. A CPU core is a copy of this functionality on the CPU die, so a quad-core CPU has 4 copies of everything shown (except larger caches). Instead of trying to run multiple instructions from the same stream of code concurrently, as before, each core runs independently of any others (one thread on each)

Confusion About Cores “Cores” in GPUs and custom processors like the Cell are not independent, whereas cores in standard CPUs are, so this has led to great confusion and misunderstanding. The operating system decides what instruction stream (thread) to run on each CPU core, and can periodically change this (thread scheduling) These issues are not part of this course, but may be covered in a parallel computing or operating systems course.

Part 2 of 3: Memory Caches and Virtual Memory

Memory Caches Caches are copies of RAM on the CPU to save time A cache miss is when one checks a cache for a piece of memory that is not there Larger caches have fewer misses, but are slower, so modern CPUs have multiple levels of cache: – Memory Buffers (ignored here), L1 Cache, L2 Cache, L3 Cache, RAM CPU only accesses memory through cache under normal circumstances

Reading From Cache want value of memory at location A if A is not in L1 if A is not in L2 if A is not in L3 L3 reads A from RAM L2 reads A from L3 L1 reads A from L2 read A from L1 Note: A is now in all levels of cache

Writing to Cache want to store value into memory at location A write A into L1 after time delay, L1 writes A into L2 after time delay, L2 writes A into L3 after time delay, L3 writes A into RAM Note: the time delays could result in concurrency issues in multi-core CPUs, so write caching can get more complicated

Caching Concerns Randomly accessing memory causes many more cache misses than sequentially accessing memory or accessing relatively few locations – This is how quicksort is usually not so quick compared to mergesort Writing to a huge block of memory that won’t be read soon can cause cache misses later, since it fills up caches with the written data – There are special instructions to indicate not to cache certain writes, avoiding this in assembly

Paging Paging, a.k.a. virtual memory mapping, is a feature of CPUs that allows the apparent rearrangement of physical memory blocks into one or more virtual memory spaces. 3 main reasons for this: – Programs can be in separate memory spaces, so they don’t interfere with each other – The OS can give the illusion of more memory using the hard drive – The OS can prevent programs from messing up the system (accidentally or intentionally)

Virtual Memory With a modern OS, no memory accesses by a program directly access physical memory Virtual addresses are mapped to physical addresses in 4KB or 2MB pages using page tables, set up by the OS.

Page Tables page table for Dude.exe: physical memory: virtual page #: page table for Sweet.exe: virtual page #: physical page #: ABCDEF

Part 3 of 3: I/O and Interrupts Just an Overview

Common I/O Devices Human Interface (what most people think of) – Keyboard, Mouse, Microphone, Speaker, Display, Webcam, etc. Storage – Hard Drive, Optical Drive, USB Key, SD Card Adapters – Network Card, Graphics Card Timers (very important for software) – PITs, LAPIC Timers, CMOS Timer

If There’s One Thing to Remember I/O IS SLOW! Bad Throughput: – Mechanical drives can transfer up to 127MB/s – Memory bus can transfer up to 30,517 MB/s (or more for modern ones) Very Bad Latency: – 10,000 RPM drive average latency: 3,000,000ns – 1333MHz uncached memory average latency: 16ns

I/O Terminology I/O Ports or Memory-Mapped I/O? – Some devices are controlled through special “I/O ports” accessible with the “in” and “out” instructions. – Some devices make themselves controllable by occupying blocks of memory and intercepting any reads or writes to that memory instead of using “in” and “out”. This is often called Direct Memory Access (DMA).

I/O Terminology Programmed I/O or Interrupt-Driven I/O? – Programmed I/O is controlling a device’s “operation” step-by-step with the CPU – Interrupt-Driven I/O involves the CPU setting up some “operation” to be done by a device and getting “notified” by the device when the “operation” is done – Most I/O in a modern system is interrupt-driven

Interrupts Instead of continuously checking for keyboard or mouse input, can be notified of it when it happens Instead of waiting idly for the hard drive to finish writing data, can do other work and be notified when it’s done Such a notification is called an I/O interrupt. (There are also exception interrupts e.g. for when doing an integer division by zero.)

Interrupts When an interrupt occurs, the CPU stops what it was doing and calls a function specified by the OS to handle the interrupt. – This function is an interrupt handler The interrupt handler deals with the I/O operation (e.g. saves a typed key) and returns, resuming whatever was interrupted Because interrupts can occur at any time, values on the stack below esp may change at any time