תרגיל כיתה 7 מבוא לטכנולוגיות מחשב CPE. – 2 – ארכיטקטורה של מעבד מודרני Execution Functional Units Instruction Control Integer/ Branch FP Add FP Mult/Div.

Slides:

Advertisements

Similar presentations

Advertisements

Lecture 4: CPU Performance

PIPELINE AND VECTOR PROCESSING

CS 6290 Instruction Level Parallelism. Instruction Level Parallelism (ILP) Basic idea: Execute several instructions in parallel We already do pipelining…

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Lecture: Pipelining Basics

1 Code Optimization(II). 2 Outline Understanding Modern Processor –Super-scalar –Out-of –order execution Suggested reading –5.14,5.7.

1 Seoul National University Wrap-Up. 2 Overview Seoul National University Wrap-Up of PIPE Design  Exception conditions  Performance analysis Modern.

Lecture 6: Pipelining MIPS R4000 and More Kai Bu

Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:

Chapter 15 IA-64 Architecture No HW, Concentrate on understanding these slides Next Monday we will talk about: Microprogramming of Computer Control units.

Wrap-Up CSC 333. – 2 – Overview Wrap-Up of PIPE Design Performance analysis Fetch stage design Exceptional conditions Modern High-Performance Processors.

1 שלבי ביצוע הוראת מכונה (1) FETCH = קרא הוראה מהזיכרון ע " פ הכתובת שמכיל ה -PC. (2) DECODE = פענח את הפקודה וקרא את האוגרים הנחוצים ( אחד או שניים ).

Chapter 15 IA-64 Architecture. Reflection on Superscalar Machines Superscaler Machine: A Superscalar machine employs multiple independent pipelines to.

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

Chapter 21 IA-64 Architecture (Think Intel Itanium)

IA-64 Architecture (Think Intel Itanium) also known as (EPIC – Extremely Parallel Instruction Computing) a new kind of superscalar computer HW 5 - Due.

The Processor Andreas Klappenecker CPSC321 Computer Architecture.

1 The Design of a Relay Computer Harry Porter, Ph.D. Portland State University April 24, 2006.

Chapter 15 IA-64 Architecture or (EPIC – Extremely Parallel Instruction Computing)

1 ׃1998 Morgan Kaufmann Publishers פקודת ה- jump 4 bits 26 bits 2 bits 00 : כתובת קפיצה במילים : כתובת קפיצה בבתים … …

Architecture Basics ECE 454 Computer Systems Programming

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.

COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.

CPU Design. Introduction – The CPU must perform three main tasks: Communication with memory – Fetching Instructions – Fetching and storing data Interpretation.

Processor: Datapath and Control

The Central Processing Unit (CPU) and the Machine Cycle.

Recitation 7: 10/21/02 Outline Program Optimization –Machine Independent –Machine Dependent Loop Unrolling Blocking Annie Luo

University of Amsterdam Computer Systems – optimizing program performance Arnoud Visser 1 Computer Systems Optimizing program performance.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.

Code Optimization II: Machine Dependent Optimization Topics Machine-Dependent Optimizations Unrolling Enabling instruction level parallelism.

Lecture 16: Basic Pipelining

Computer Organization and Assembly Languages Yung-Yu Chuang 2005/09/29

Machine-Dependent Optimization CS 105 “Tour of the Black Holes of Computing”

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Assembly language.

Instruction Level Parallelism

CS 286 Computer Architecture & Organization

Lecture 16: Basic Pipelining

Lecture 07: Pipelining Multicycle, MIPS R4000, and More

Five Execution Steps Instruction Fetch

Out of Order Processors

Lecture: Pipelining Basics

The fetch-execute cycle

Pipelining: Advanced ILP

Machine-Dependent Optimization

Figure 8.1 Architecture of a Simple Computer System.

CS 5513 Computer Architecture Pipelining Examples

Lecture 6: Advanced Pipelines

Pipelining Multicycle, MIPS R4000, and More

Lecture 16: Basic Pipelining

Lecture 11: Memory Data Flow Techniques

Figure 8.1 Architecture of a Simple Computer System.

Lecture 8: Dynamic ILP Topics: out-of-order processors

Code Optimization(II)

Computer System Design Lecture 5

Dynamic Pipeline Structure

Optimizing program performance

Computer System Design Lecture 5

Computer Organization and Assembly Languages Yung-Yu Chuang 2005/09/29

Lecture 9: Dynamic ILP Topics: out-of-order processors

Objectives Describe common CPU components and their function: ALU Arithmetic Logic Unit), CU (Control Unit), Cache Explain the function of the CPU as.

Lecture 11: Machine-Dependent Optimization

CS 3853 Computer Architecture Pipelining Examples

Lecture: Pipelining Basics

Instruction execution and ALU

Conceptual execution on a processor which exploits ILP

Presentation transcript:

תרגיל כיתה 7 מבוא לטכנולוגיות מחשב CPE

– 2 – ארכיטקטורה של מעבד מודרני Execution Functional Units Instruction Control Integer/ Branch FP Add FP Mult/Div LoadStore Instruction Cache Data Cache Fetch Control Instruction Decode Address Instrs. Operations Prediction OK? Data Addr. General Integer Operation Results Retirement Unit Register File Register Updates Control Unit (CU) Arithmetic Logic Unit (ALU)

– 3 – היכולות של Pentium III  ניתן להריץ במקביל: 1 load 1 store 2 integer (one may be branch) 1 FP Addition 1 FP Multiplication or Division Some Instructions Take > 1 Cycle, but Can be Pipelined InstructionLatencyCycles/Issue Load / Store31 Integer add11 Integer Multiply41 Integer Divide3636 חלקים שונים של אותה פקודה מבוצעים על ידי רכיבים שונים. לכן ניתן לבצע מספר רכיבים של פקודות שונות בו זמנית.

– 4 – דוגמאות לחישובCPE – מכפלת איברים

– 5 – שרטוט גרף CPE CPE = 4 ללא מגבלת משאבים

– 6 – פריסת לולאה  באיזו פונקציה ה- CPE עבור חישוב מכפלה יהיה קטן יותר: int funca(int* a, int size) { int result = 1; for (int i=0; i < size; i+=2) { result = result * a[i]; result = result * a[i+1]; } return result; } int funcb(int* a, int size) { int result1 = 1; int result2 = 1; for (int i=0; i < size; i+=2) { result1 = result1 * a[i]; result2 = result2 * a[i+1]; } return result1*result2; } צורה זו של פריסת לולאה מורידה מ- CPE=4

– 7 – סכימת איברים CPE = 1

– 8 – CPE = 2 סכימת איברים תחת מגבלת משאבים

– 9 – פריסה ו- CPE  נפרוס את לולאת החיבור ל- 3:

– 10 – פריסה של חיבור CPE = 1

– 11 – תוצאות פריסה

– 12 – תרגיל  נתון הקוד הבא: int inner_product (int * u, int * v, int n) { int i; int res=0; for ( i=1 ; i<n ; i++ ) { res = res + u[i]*v[i]; } return res; }

– 13 – תרגיל  שלב ראשון - נתרגם לשפת מכונה int inner_product (int * u, int * v, int n) { int i; int res=0; for ( i=1 ; i<n ; i++ ) { res = res + u[i]*v[i]; } return res; } Assembly: move 8(R8), R3 move 12(R8), R2 move 16(R8), R6.loop: move (R2,R4,4),R5 multiply (R3,R4,4),R5 add R5,R1 increment R4 compare R6,R4 jl.loop

– 14 – תרגיל  שלב שני - מה באמת המחשב מריץ? Assembly:.loop: move (R2,R4,4),R5 multiply (R3,R4,4),R5 add R5,R1 increment R4 compare R6,R4 jl.loop load (R2,R4.0,4) → R5.1 load (R3,R4.0,4) → t.1 multiply t.1,R5.1 → R5.1 add R5.1,R1.0 → R1.1 increment R4.0 → R4.1 compare R6,R4.1 → cc.1 jl –taken cc.1

– 15 –  שלב שלישי- מציירים איטרציה בודדת load (R2,R4.0,4) → R5.1 load (R3,R4.0,4) → t.1 multiply t.1,R5.1 → R5.1 add R5.1,R1.0 → R1.1 increment R4.0 → R4.1 compare R6,R4.1 → cc.1 jl –taken cc.1 תרגיל load R4.0 mul R5.1 t.1 add R5.1 R1.0 R1.1 inc cmp R4.1 jmp cc.1

– 16 – load R4.0 mul R5.1 t.1 add R5.1 R1.0 R1.1 inc cmp R4.1 jmp cc.1 load mul R5.2 t.2 add R5.2 R1.2 inc cmp R4.2 jmp cc.2