Processor Architecture

Slides:



Advertisements
Similar presentations
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
Advertisements

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Computer Organization and Architecture
Chapter 8. Pipelining.
Processor System Architecture
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
CSCI 4717/5717 Computer Architecture
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Chapter 12 Pipelining Strategies Performance Hazards.
Chapter 17 Parallel Processing.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 12 CPU Structure and Function. Example Register Organizations.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
PhD/Master course, Uppsala  Understanding the interaction between your program and computer  Structuring the code  Optimizing the code  Debugging.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipelining By Toan Nguyen.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
System Calls 1.
Parallelism Processing more than one instruction at a time. Pipelining
Operating system Structure and Operation by Dr. Amin Danial Asham.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
Lecture on Central Process Unit (CPU)
Introduction  The speed of execution of program is influenced by many factors. i) One way is to build faster circuit technology to build the processor.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Group 1 chapter 3 Alex Francisco Mario Palomino Mohammed Ur-Rehman Maria Lopez.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
بسم الله الرحمن الرحيم MEMORY AND I/O.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Chapter One Introduction to Pipelined Processors.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
What’s going on here? Can you think of a generic way to describe both of these?
Multiprogramming. Readings r Chapter 2.1 of the textbook.
Introduction to Operating Systems Concepts
Chapter Overview General Concepts IA-32 Processor Architecture
Protection in Virtual Mode
Chapter 13 The Function of Operating Systems
Process Management Process Concept Why only the global variables?
William Stallings Computer Organization and Architecture 8th Edition
Operating Systems (CS 340 D)
Parallel Processing - introduction
Assembly Language for Intel-Based Computers, 5th Edition
CS1251 Computer Architecture
Chapter 3 Top Level View of Computer Function and Interconnection
Pipelining: Advanced ILP
Superscalar Processors & VLIW Processors
Chapter 15, Exploring the Digital Domain
Memory Management Tasks
Computer Architecture
COMPUTER ORGANIZATION AND ARCHITECTURE
Pipelining.
Presentation transcript:

Processor Architecture Chapter 4

Better utilization of Memory There are different concepts with reference to memory organization. We have already discussed: Cache Memory Virtual Memory But for the better utilization of our hardware unit, we have more concepts

Parts of the Execution Process is an isolated entity that's managed by the operating system Task is a part of a job - sometimes the only part. A job is some script that's executed to do a specific set of task(s). A task represents the execution of a single process or multiple processes on a compute node A thread is the smallest unit of execution that lies within the process and share resources So : Job -> one or more task -> one or more process -> one or more threads Note : Threads can themselves split themselves into two or more simultaneously running tasks.  TRU-COMP2130 Introduction

More Concepts Threads share the resources whereas, processes do not share resources. Context switching This is the act of switching a processor from executing one process to executing another. This is called a context switch because the state of each process is called its context. A context switch is performed in a kernel by a routine that is called a dispatcher or scheduler. A context switch happens when the OS code (running preemptively) alters the state of the processor (the registers, mode, and stack) between one process or thread's context and another. The state of the processor may be at a certain line of code in a one thread. It will have temporary data in registers, a stack pointer at a certain region of memory, and other state information. A preemptive OS can store this state (either to static memory or onto the processes' stack) and load the state of a previous process. TRU-COMP2130 Introduction

Pipelining vs Parallel processing In both, multiple “things” processed by multiple “functional units” Pipelining : a process is broken into sequence of processes, and each process is handled by different functional unit Parallel processing : Each process is processed entirely by a single functional unit Example : graphics, sound etc can be all done separately and run at the same time TRU-COMP2130 Introduction

Parallel Processing Multiple outputs are computed in parallel in a clock period The effective sampling speed is increased by the level of parallelism Can also be used to reduce the power consumption Two types of parallelism: Instruction level – this may be implemented through the program code and is called as parallel programming or concurrent programming (in OS it is Fork-Joind operations) Thread level – two or more complete processors, fabricated on the same silicon chip (Multi-core processors), which can execute functions from two or more programs/thread at a same time Ex: Intel Core duo – has2 x 86 processors on same chip XBox360 has 3 PowerPC cores PS3 : has 9 cores – I general purpose, 8 special purpose SIMD (single instruction Multiple Data processors) TRU-COMP2130 Introduction

Data parallelism Single Instruction Single Data Single Instruction Multiple Data TRU-COMP2130 Introduction

Pipelining It is the concept of improving the performance of the system by letting the different stages of the system operate concurrently. A key feature of pipelining is that it increases the throughput of the system, that is, the number of customers served per unit time, but it may also slightly increase the latency, that is, the time required to service an individual customer. For example, a customer in a cafeteria who only wants a salad could pass through a non-pipelined system very quickly, stopping only at the salad stage. A customer in a pipelined system who attempts to go directly to the salad stage risks incurring the wrath of other customers.

Pipelining In the traditional system, A circuit is designed, each task will work on the circuit and then the new task would start from beginning. In contemporary logic design, we measure circuit delays in units of picoseconds (abbreviated “ps”), or 10−12 seconds. Here, we assume the combinational logic requires 300 picoseconds, while the loading of the register requires 20 ps. The time flows from left to right. A series of instructions (here named I1, I2, and I3) are written from top to bottom

Throughput Throughput is calculated as: The total time required to perform a single instruction from beginning to end is known as the latency. Here, the latency is 320 ps, the reciprocal of the throughput. TRU-COMP2130 Introduction

Pipeline Let us divide the computation performed by our system into three stages, A, B, and C, where each requires 100 ps put pipeline registers between the stages so that each instruction moves through the system in three steps, requiring three complete clock cycles from beginning to end. I2 is allowed to enter stage A as soon as I1 moves from A to B, and so on. In steady state, all three stages would be active, with one instruction leaving and a new one entering the system every clock cycle. We can see this during the third clock cycle in the pipeline diagram where I1 is in stage C, I2 is in stage B, and I3 is in stage A.

The increased latency is due to the time overhead of the added pipeline registers. In this system, we could cycle the clocks every 100 + 20 = 120 picoseconds, giving a throughput of around 8.33 GIPS. Since processing a single instruction requires 3 clock cycles, the latency of this pipeline is 3 × 120 = 360 ps. We have increased the throughput of the system by a factor of 8.33/3.12 = 2.67 at the expense of some added hardware and a slight increase in the latency (360/320 = 1.12). TRU-COMP2130 Introduction

Steps of pipelining To introduce pipelining in a processor P, the following steps must be followed: Sub-divide the input process into a sequence of subtasks. These subtasks will make stages of pipeline, which are also known as segments. Each stage Si of the pipeline according to the subtask will perform some operation on a distinct set of operands. When stage Si has completed its operation, results are passed to the next stage Si+1 for the next operation. The stage Si receives a new set of input from previous stage Si-1 TRU-COMP2130 Introduction

Pipelining Processor A pipeline processor can be defined as a processor that consists of a sequence of processing circuits called segments and a stream of operands (data) is passed through the pipeline. In each segment partial processing of the data stream is performed and the final output is received when the stream has passed through the whole pipeline. An operation that can be decomposed into a sequence of well-defined sub tasks is realized through the pipelining concept TRU-COMP2130 Introduction

Problems in pipelining Data dependency between successive tasks: It becomes difficult if there is dependencies between the instructions of two tasks used in the pipeline. i.e. one instruction cannot be started until the previous instruction returns the results, as both are interdependent. Another problem may be when that both instructions try to modify the same data object. These are called data hazards. Resource Constraints: When resources are not available at the time of execution then delays are caused in pipelining. For example, if one common memory is used for both data and instructions and there is need to read/write and fetch the instruction at the same time then only one can be carried out and the other has to wait. Branch Instructions and Interrupts in the program: the branch instructions alter the normal flow of program, which delays the pipelining execution and affects the performance. Similarly, there are interrupts that postpones the execution of next instruction until the interrupt has been serviced. Branches and the interrupts have damaging effects on the pipelining Introduction TRU-COMP2130

Interrupts An interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. It can be of two types: Hardware interrupt : an I/O device like keyboard, mouse that asks the processor to read the position Software Interrupt : These may occur because of the events occurring during program instruction. There may be different interrupts that may be triggered in assembly language Exceptions : These are exceptional conditions that may occur in the code while executing and hinder in the normal functioning of the code. TRU-COMP2130 Introduction