PhD/Master course, Uppsala 2011.  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.

Slides:



Advertisements
Similar presentations
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Advertisements

Part IV: Memory Management
CPU Review and Programming Models CT101 – Computing Systems.
Lecture 6: Multicore Systems
Computer Organization and Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Computer Organization and Architecture
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Computer System Overview
Chapter 7 Interupts DMA Channels Context Switching.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
GCSE Computing - The CPU
Computer Systems CS208. Major Components of a Computer System Processor (CPU) Runs program instructions Main Memory Storage for running programs and current.
CPU Describe the purpose of the CPU
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
CH12 CPU Structure and Function
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
System Calls 1.
Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
The Computer Systems By : Prabir Nandi Computer Instructor KV Lumding.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
1 Chapter 2: Computer-System Structures  Computer System Operation  I/O Structure  Storage Structure  Storage Hierarchy  Hardware Protection  General.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
CE Operating Systems Lecture 14 Memory management.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
PART 6: (1/2) Enhancing CPU Performance CHAPTER 16: MICROPROGRAMMED CONTROL 1.
The fetch-execute cycle. 2 VCN – ICT Department 2013 A2 Computing RegisterMeaningPurpose PCProgram Counter keeps track of where to find the next instruction.
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
Processor Architecture
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture on Central Process Unit (CPU)
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Chapter 2: Computer-System Structures(Hardware) or Architecture or Organization Computer System Operation I/O Structure Storage Structure Storage Hierarchy.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CS 1410 Intro to Computer Tecnology Computer Hardware1.
Hello world !!! ASCII representation of hello.c.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Operating Systems A Biswas, Dept. of Information Technology.
Introduction to Operating Systems Concepts
CPU Lesson 2.
Introduction to Operating Systems
Chapter 2: Computer-System Structures
Computer Organization & Assembly Language Chapter 3
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Introduction to Operating Systems
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Module 2: Computer-System Structures
Architectural Support for OS
Module 2: Computer-System Structures
Architectural Support for OS
Chapter 2: Computer-System Structures
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Chapter 2: Computer-System Structures
Module 2: Computer-System Structures
Module 2: Computer-System Structures
COMP755 Advanced Operating Systems
Presentation transcript:

PhD/Master course, Uppsala 2011

 Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging strategies  Parallelization  A complementary course to numerical methods and programming languages

CPU GPU Peripheral devices Memory

 Computers perform any data transformation on registers or in pipelines!  Registers are typically 64-bit long. Intel Core I series has four 14 stage pipelines. Each stage is 32-bit long. One important conclusion is that double precision register operations are not much slower than single precision!  Bus operates at rates of a few hundred MHz. How does this work with a 3 GHz CPU?  Bus width×speed determines memory access performance.

Simple example: addition We want to add two numbers: A (located in memory M A ) + B (located in memory M B ) || The result is stored to memory M C

1. Load content of memory M A to register R 1 2. Load content of memory M B to register R 2 3. Add R 1 and R 2 and store result in R 3 4. Save the content of register R 3 to memory M C Registers are part of CPU. Modern CPUs are so fast that addition is takes negligible time compared to memory access

Solution is cache memory CACHE (French) hidden Very fast and very close Loading and saving between main memory and cache memory is done by a “separate” CPU which takes an advance look at the program Multi-core CPUs will have three levels of cache memory with only L3 shared by all cores

 Cache memory can be loaded/saved in blocks, very fast  Multiple registers can be loaded simultaneously  Multiple operations performed in the same time – pipeline(s). Intel Core i has 4 pipelines per core each capable of doing up to 14 instructions simultaneously. Load AddDivideSave Add Divide LoadAddDivide Save time Load SqrtLogSave Sqrt Log LoadSqrtLog Save Pipeline #1 Pipeline #2

Main memory is typically read within nanoseconds. Cache memory must operate close to the CPU clock rate: 3 GHz CPU requires 3 nanoseconds cache (10 CPU cycles)!!!  To be that fast cache memory must be small and static  Cache memory is very expensive  Branching in the code can be deadly!!!

 Two cases: explicit and conditional  Explicit is easy: go to  Conditional:if true do else skip it and is not so bad either  Combination of the two is bad: if true go to else …  Why is this bad? Examples of such situation

 What happens with conditional branching is that the content of your cache may become inconsistent with the next part of the code  In this situation (known as cache miss) you need to save cache entries that are not saved yet and load new memory parts  A special part of a modern CPU (branch predictor) tries to guess the branching direction  Loops are converted to count-down scheme helping with prediction

 Computer codes need variables, arrays etc. How this is done in hardware? There are two concepts: stack and heap.  Stack is characterized by its start, length and current position. Current position is stored in a special register. Many stacks can co-exist but cannot be accessed simultaneously (dynamic variables)  Heap is a stack with the current position moving from the largest memory address backwards (static variables)  Hardware provides memory protection: two processes (programs) cannot read/modify each others memory  No protection is offered within a single program space!!!

 Memory can be addressed directly or by segments (segment:offset). 64-bit CPUs can address up to memory mode while 32-bit memory model is restricted to 4 Gb of address space.  Modern computer memory bus is 64-bit wide so it can carry the address or the content of 8 memory bytes in one go.  Memory is accessed in the following sequence:  CPU sends memory address to the memory controller through the bus (segment and offset are sent together).  Memory controller fetches the content and sends it back to the CPU

Programs often use subroutines. How does this work?  Special stack is organized for this purpose  Before a subroutine is called, all the registers and a pointer to the next command to be executed are stored in this stack  New stack is created or an existing one is pulled up for the subroutine and the execution starts  On exit the content of the registers is restored from the stack and the execution continues in the calling code  This process is called context switching  Context switching is the base of multi-processing

 What happen when a code tries to perform a division but the content of the divider register is zero? Actually, nothing. The result is marked by a special combination of bits called NaN. No other action is normally taken  Modern processors have a special mode called Exception Tracking. When activated, the processor (hardware) initiates context switching calling exception handling program. This allows locating the place in the code where the problem occurred.

 CPU is also connected to the peripheral devices (keyboard, mouse, hard drives, network, printers) and to the graphics card  Expensive graphics cards are powerful computers all by themselves with GPU, fast bus and huge memory. In small laptops the main CPU acts (at least partially) as a GPU. A special dedicated bus connects the CPU and the graphics

 External memory (HD, SSD) is much slower even than the main memory (10 microseconds versus 50 nanoseconds)  Bus controller allows direct memory access  the CPU specifies that certain information from a disk must be copied to a certain memory area  The copying does not involve the CPU  It works in both directions HD→RAM or RAM→HD  Transferring an optimal amount of data in one such operation results in large gain of speed

 To make the I/O operations more efficient and take maximum advantage of the direct access the operating system often uses RAM to simulate HD.  The synchronization between the memory buffer and the HD is done asynchronously, once in a while.  It speeds up disk access but creates a potential threat to the integrity of the file system.

Literature: The C programming Language Kernighan and Ritchie FORTRAN 90/95 explained Metcalf Python

Find out what are you going to be using for the exercises:  What computer (Intel, PowerPC)  What operating system (Windows, Linux, Mac OS X)  What programming languages do you have available (C, C++, FORTRAN, Python …)