معماری & کتاب Patterson & Henessi

معماری & کتاب Patterson & Henessi
پردازنده MIPS معماری & کتاب Patterson & Henessi Amirkabir University of Technology Computer Engineering & Information Technology Department

مراحل طراحی یک پردازنده
با آنالیز مجموعه دستورات نیازمندیهای DataPath را مشخص میکنیم اجزا DataPath و روش Clocking را انتخاب میکنیم اجزا DataPath را در کنار هم قرار میدهیم با آنالیز هر دستورالعمل نقاط کنترلی را که مسیر داده را تحت تاثیر قرار میدهند را مشخص میکنیم منطق Control را پیاده سازی میکنیم

بلوک دیاگرام کلی 32 32 32

اجزای اصلی پردازنده 32 32 32

Data Path & Control path
combinational elements state (sequential) elements Control path مشخص میکند که سیگنالهای کنترلی و زمانبندی چگونه به المانهای Data Path میرسد.

مراحل لازم برای اجرای دستور
واکشی دستور از محلی که PC اشاره میکند خواندن محتوی 0 و یا 1 ویا 2 رجیستر بنا به فیلدهای مشخص شده در دستور انجام محاسبات ALU همه دستورات بنوعی به ALU نیاز دارند: دستورات انتقال داداه: برای محاسبه آدرس دستورات محاسباتی: برای انجام محاسبه دستورات انشعاب: برای محاسبه آدرس موثر

تفاوت در اجرای دستورات دستورات انتقال داده دستورات ALU دستورات انشعاب
load: access memory for read data {ld R1, 0(R2)} store: access memory for write data {ld 0(R2), R1} دستورات ALU no memory access for operands access a register for write of result {add R1,R2, R3} دستورات انشعاب change PC content based on comparison {bnez R1, Loop}

مراحل مورد نیاز دستورات مختلف

معماری چندمرحله ای: Multi Sateged
اجرای دستورات در MIPS طی مراحل زیر انجام میشود: IR <-- Mem[PC] PC <-- PC + 4 دستور از محل مشخص شده توسط PC واکشی شده و در IR قرار داده میشود. IF Instruction Fetch decode I31..26 ALUop A <-- Reg[IR25..21] ALUop B <-- Reg[IR20..16] ALUOut <-- PC + (sgnxtnd(IR15..0)) << 2 دستور موجود در IR دیکد میشود، مقدار بعدی PC محاسبه میشود، و اپراندهای موردنیاز از رجیسترفایل خوانده میشود. ID Instruction Decode ALUOut <-- A + (B or sgnxtnd(IR15..0)) if ((op == branch) && (A == B)) PC <-- ALUOut if (op == jump) PC <-- PC || (IR25..0 << 2) عملیات مربوط به ALU در این مرحله انجام میشوند. EX Execute MDR <-- Mem[ALUOut] //load or Mem[ALUOut] <-- B if (op == 0) Reg[IR15..11] <-- ALUOut اگر دستور فعلی load باشدداده از حافظه خوانده میشود.اگردستور store باشد داده در حافظه نوشته میشود و برای سایر دستورات عملی انجام نمیشود. MA Memory Access Reg[IR20..16] <-- MDR برای دستوراتی که نتیجه ای تولیدمیکنند، نتایج در رجیستر فایل نوشته میشود. تقریبا تمامی دستورات به این مرحله نیاز دارند. WB WriteBack

اجزای Data Path حداقل اجزای Data Path باید شامل المانهای ترکیبی و ترتیبی باشد که بتواند عملیات زیر را اجرا نماید. Fetch instructions and data from memory Decode instructions and dispatch them to the execution unit Execute arithmetic & logic operations Update state elements (registers and memory)

رجیستر فایل رجیسترهای 32 گانه پردازنده در ساختاری به اسم رجیستر فایل نگهداری میشوند. هر یک از رجیسترها را میتوان با مشخص کردن شماره آن خواند و یا نوشت. Register File’s I/O structure 3 inputs derived from current instruction to specify register operands (2 for read and 1 for write) 1 input to write data into a register 2 outputs carrying contents of the specified registers 64 5 Read data 1 data 2 Read reg 1 Read reg 2 Write reg Write data Register numbers Data 5 5 Registers 32 32 RegWrite Register file’s outputs are always available on the output lines Register write is controlled by RegWrite lead

مدارات ترکیبی combinational logic input output
Output determined entirely by input Contains no storage element

سایر مدارات n 2n 1 Multiplexor selects one out of 2n inputs
ALU performs arithmetic & logic operations AND: 000 OR: 001 add: 010 subtract: 110 set on less than: 111 other 3 combinations unused ALU 32 3 result zero

اجزای ترتیبی input output write clock
storage State Element input output write clock State element has storage (i.e., memory) State defined by storage content Output depends on input and the state Write lead controls storage update Clock lead determines time of update Examples: main memory, registers, PC

روش اعمال کلاک Needed to prevent simultaneous read/write to state elements Edge-triggered methodology: state elements updated at rising clock edge State element 1 Combinational logic State element 2 clock input

ورودی خروجی اجزا State element 1 Combinational logic Combinational elements take input from one state element at clock edge and output to another state element at the next clock edge, Within a clock cycle, state elements are not updated and their stable state is available as input to combinational elements, Output can be derived from a state element at the edge of one cycle and input into the same state at the next.

Datapath Schematic Registers Data Data ALU PC Instruction Memory
Address Register # Address Instruction Data Memory Register # Data

Datapath Building Blocks: واکشی دستورات
محتوی PC توسط یک جمع کننده با 4 جمع میشود تا آدرس دستور بعدی محاسبه شود. مقدار PC به حافظه داده میشود تا دستور واکشی شده و به سایر اجزای Data Path ارسال شود فرض میشودکه دستورو داده در دو حافظه مجزا نگهداری میشوند. ( بعدا به دلایل آن اشاره خواهد شد) Read address Instruction Memory PC ALU Adder 4 32

Datapath Building Blocks: دیکد دستورات
باید مقدار opcode و سایر فیلدهای لازم دستور به واحد کنترل فرستاده شوند. رجیسترهای لازم هم از رجیستر فایل خوانده شوند. Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 Control Unit

Datapath Building Blocks: R-Type Instruction
opcode rs rt 6 5 rd shamt func R-Type Format برای دستورات محاسباتی و منطقی که توسط این فرمت نشان داده میشوند لازم است تا دو رجیستر از رجیستر فایل خوانده شده و داده آنها به ALU منتقل شود. عمل ALU بر اساس نوع دستور تعیین شده و بر روی محتوی رجیسترها انجام میشود. نتیجه در رجیستر مقصد نوشته میشود. سیگناهای کنترلی باید ایجاد شود تا نتیجه در لبه کلاک در رجیستر مقصد نوشته شود. همچنین سیگنال ALUop باید تولید شود تا عمل ALU را تعیین کند. Read reg 1 Read reg 2 Write reg Write data Register File Read data 1 data 2 ALU Instruction RegWrite ALUop 5 zero

I-Type Instruction: load/store
opcode rs rt immediate 6 5 16 I-Type برای محاسبه آدرس باید مقدار افست 16 بیتی موجود در دستورالعمل بصورت یک عدد علامت دار 32 بیتی تبدیل شده وبا مقدار پایه موجود در rs جمع شود. LW R2, 232(R1) SW R5, -88(R4) 16 sign extend 32

I-Type Instruction: load/store
بنابراین اجرای این دستور با اجزای زیر درارتباط خواهد بود: Register file برای دسترسی به رجیستر پایه و رجیستر مقصد Sign extender برای تبدیل آدرس ALU برای جمع کردن آدرس پایه و مقدار افست توسعه داده شده Data memory to load/store data مقادیر آدرس، و داده ورودی یاخروجی باید به حافظه فرستاده شوند سیگناهای کنترلی MemRead, MemWrite, Clock باید به حافظه فرستاده شوند

Datapath Building Blocks: load/store
opcode rs rt immediate 6 5 16 I-Type Read reg 1 Read reg 2 Write reg Write data Registers Read data 1 data 2 ALU Instruction RegWrite ALUop 5 zero sign extend 16 Address Write data Data Memory 32 MemWrite MemRead

I-Type Instruction: bne
مقصد دستور انشعاب از جمع مقدار افست با PC بدست می آید. از آنجائیکه این مقصد باید مضربی از 4باشد، نیاز است تا مقدار افست به اندازه 2 بار به سمت چپ شیفت داده شود. if Reg[rs] != Reg[rd], PCcurrent=(PCprevious+4) + Imm<< 2 else if Reg[rs] == Reg[rt] PCcurrent=(PCprevious+4) opcode rs rt immediate 6 5 16 I-Type bne R1, R2, Imm 32 shift left 2 16 sign extend ALU Adder PC+4

I-Type Instruction: bne
برای مقایسه رجیسترهای دستور bne از ALU استفاده میشود. از اینرو نتیجه این مقایسه با سیگنال Zero که در خروجی ALU تعبیه شده است مشخص میگردد. بعلت درگیر بودن ALU برای محاسبه آدرس مقصد از یک جمع کننده دیگر استفاده میشود.

Datapath Building Blocks: bne
opcode rs rt immediate 6 5 16 I-Type ALUop = subtract 5 Read data 1 data 2 zero Instruction Read reg 1 Read reg 2 Write reg Write data To branch control logic 5 ALU ALU Registers RegWrite 16 sign extend 32 shift left 2 ALU Adder Branch target PC+4 from Instruction Datapath

Datapath Building Blocks: bne
Instruction Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU zero ALU control Sign Extend 16 32 Shift left 2 Add 4 PC Branch target address (to branch control logic)

Datapath Building Blocks: jump instruction
26 بیت مقدار موجود در دستورالعمل به اندازه 2 بیت به سمت چپ شیفت داده شده و با 28 بیت کم ارزشPC جایگزین میشود. Read Address Instruction Memory Add PC 4 Shift left 2 Jump address 26 28

ساخت یک Data Path واحد بااجزای فوق
اجزای مورد نیاز برای قسمت های مختلف در کنار هم قرار داده شده و سیگناهای کنترلی و مالتی پلکسرهای مورد نیاز به آن افزوده میشوند. در طراحی Single Cycle همه مراحل واکشی، دیکد و اجرا در یک کلاک انجام میشود! زمان این کلاک برابر خواهد بود با زمان لازم برای طی کردن طولانی ترین مسیر که میتواند زمان زیادی باشد. علاوه برآن امکان به اشتراک گذاشتن سخت افزار برای عملیات یکسان وجود ندارد.

Fetch, R, and Memory Access Portions
MemtoReg Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 ALU ovf zero ALU control RegWrite Data Read Data MemWrite MemRead Sign Extend 16 32 ALUSrc

افزودن واحد کنترل این واحد باید :
عملیاتAlu را مشخص نماید، سیگناهای رجیستر فایل و حافظه را تولید نماید، جریان داده از طریق مالتی پلکسرها را کنترل نماید. ملاحظات مقدار اپکد همیشه در بیت های قراردارد آدرس رجیسترهائی که باید خوانده شوند توسط فیلد rs (بیت های وفیلد rt (بیت های 16-20)مشخص میشوند. آدرس رجیستری که باید نوشته شوند دریکی از دو مکان زیر است: فیلد rt برای دستور lw و فیلد rd برای دستورات R-Type مقدار افست در بیت های 0-15 است. I-Type: op rs rt address offset 31 25 20 15 R-type: 5 rd funct shamt 10 J-type: target address

Single Cycle Datapath with Control Unit
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU Note mux control inputs have been swapped (for three of the muxes) from the last picture to be consistent with the book. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

R-type Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For lecture Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

Load Word Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

Load Word Instruction Data/Control Flow

Branch Instruction Data/Control Flow
Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] Read Data 1 ALU For class handout – have a student come forward and mark the connections in the datapath that are active. And show the state of the control lines. Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

Branch Instruction Data/Control Flow

Adding the Jump Operation
Instr[25-0] 1 Shift left 2 28 32 26 PC+4[31-28] Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC For lecture Good exam questions Add jalr rs,rd 0 rs 0 rd 0 9 jump to instr whose addr is in rs and save addr of next inst (PC+4) in rd Add the PowerPC addressing modes of update addressing and indexed addressing (will have to expand the RegFile to be three read port and two write port) Add andi, ori, addi - have to have both a signextend and a zeroextend and choose between the two, will have to augment the ALUop encoding (since can’t get the op information out of the funct bits as with R-type) Add mult rs, rt with the result being left in hi|lo - so also include the mfhi and mflo instructions (will have to add a multiplier, the hi and lo registers and then a couple of muxes and their control). Add barrel shifter Instr[31-0] Read Data 1 ALU Write Addr Read Data 2 1 Write Data Instr[ ] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

مزایا و معایب معماری Single Cycle
زمان کلاک بطور موثر استفاده نمیشود زیرا بر اساس طولانی ترین دستور تنظیم شده است. این امر در صورت داشتن دستورات پیچیده مثل دستورات اعشاری میتواند خیلی وخیم باشد. فضای بیشتری در روی چیپ لازم دارد زیرا تعداد بیشتری از المانهای سخت افزاری لازم دارد. ساده و قابل فهم است. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted. Clk lw sw Waste Cycle 1 Cycle 2

محاسبه طولانی ترین دستورالعمل
Instruction class Instruction memory Register read ALU operation Data memory Register write Total (ps) ALU type 200 50 100 400 lw 600 Sw 550 Branch 350 Jump طول کلاک باید بر اساس زمان لازم برای طولانی ترین دستور طراحی شود. 600 ps

نگرش Multicycle Datapath
هر دستور به تعدادی مرحله کوچکتر تقسیم شده و هر یک از این مراحل در یک کلاک اجرا میشوند. بدین ترتیب برای اجرای هر دستور به تعدادی کلاک کوچک تر نیاز خواهیم داشت. مراحل طوری انتخاب میشوند که کار انجام گرفته در آنها متعادل باشد. در هر مرحله فقط از یکی از بلوک های سخت افزاری اصلی استفاده میشود. هر دستور تعداد متفاوتی کلاک لازم دارد. فقط به یک حافظه نیاز دارد. البته در هر سیکل فقط میتوان یکبار به حافظه دسترسی داشت. فقط به یک ALU/adder نیاز دارد. البته در هر سیکل بیش از یکبار از ALU نمیتوان استفاده نمود.

نگرش Multicycle Datapath
Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Data 2 ALU IR MDR A B ALUout در این معماری مقادیری که در سیکلهای بعدی دستور مورد نیاز هستند در رجیسترهائی ذخیره میشوند. در نتیجه باید اجزای زیر به معماری افزوده شوند: IR – Instruction Register MDR – Memory Data Register A, B – regfile read data registers ALUout – ALU output register Have to add multiplexors in front of several of the functional unit inputs because the functional units are shared by different instruction cycles. Reading/writing to any of the internal registers or the PC occurs (quickly) at the end of a clock cycle reading/writing to the register file takes ~50% of a clock cycle since it has additional control and access overhead (reading can be done in parallel with decode)

The Multicycle Datapath with Control Signals
PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 1 Address Memory PC Read Addr 1 A IR Read Data 1 Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data 1 Read Data 2 B MDR 1 Write Data 4 1 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]

واحد کنترل Multicycle Combinational control logic State Reg Inst Opcode Datapath control points Next State . . . در معماری Multicycle سیگنالهای کنترل را نمیتوان فقط از روی بیت های دستورالعمل بدست آورد. زیرا اطلاعاتی در مورد سیکلهای دستورالعمل در اپکد آن ذخیره میشود. از اینرو از یک ماشین FSM برای طراحی واحد کنترل استفاده میشود. تعدادی state محدود برای پردازنده فرض میشود که در state reg ذخیره میشوند. state بعدی از روی state فعلی ومقادیر ورودی تعیین میشوند.

مراحل 5 گانه دستور load IFetch: Instruction Fetch and Update PC
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB IFetch: Instruction Fetch and Update PC Dec: Instruction Decode, Register Read, Sign Extend Offset Exec: Execute R-type; Calculate Memory Address; Branch Comparison; Branch and Jump Completion Mem: Memory Read; Memory Write Completion; R-type Completion (RegFile write) WB: Memory Read Completion (RegFile write) As shown here, each of these five steps will take one clock cycle to complete.

مراحل اجرای دستورات مختلف
اجرای دستورات متفاوت زمانهای مختلفی نیاز دارد. اجرای دستورات MIPS به 3 تا 5 سیکل نیاز دارند.

اجزای اصلی واحد کنترل بخش مشترک واکشی دیکد و خواندن دستور
اجزای مربوط به دستورات M e m o r y a c s i n t u ( F g 5 . 3 8 ) R - p 9 B h 4 J 1 I f / d 7 S

بخش مشترک واکشی دیکد و خواندن دستور A L U S r c = B 1 O p M e m R a d
B 1 O p M e m R a d I o D W i t P C u n s f h / g ( ' ) - y E Q J F 5 . 3 8 9 4

دسترسی به حافظه محاسبه آدرس ترتیب خواندن از حافظه ترتیب نوشتن در حافظه
M e m W r i t I o D = 1 R a d A L U S c B O p g s y u n ( ' ) - b k 4 2 5 3 F T . 7 محاسبه آدرس ترتیب خواندن از حافظه خواندن از حافظه ذخیره در رجیستر ترتیب نوشتن در حافظه نوشتن در حافظه

دستورات R-type اجرای دستورات نوشتن نتیجه در رجیستر A L U S r c = 1 B O
O p R e g D s t W i M m o E x u n - y l 6 7 ( ) F a T 5 . 3 اجرای دستورات نوشتن نتیجه در رجیستر

دستور Branch در یک مرحله اجرامیشود:
o m p l e t i 8 ( O = ' E Q ) F s 1 T g u 5 . 3 7 A L U S P C W d در یک مرحله اجرامیشود: اگرشرط درست باشد PC با آدرس دستورانشعاب پر میشود.

دستور Jump PC با آدرس محل انشعاب پر میشود. J u m p c o l e t i n 9 ( O
= ' ) F r s a 1 T g 5 . 3 7 P C W S PC با آدرس محل انشعاب پر میشود.

ماشین حالت کامل 53 P C W r i t e S o u c = 1 A L U B O p n d R g D s M
A L U B O p n d R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 53

Finite State Machine for Control
P C W r i t e o n d I D M m R g S u c A L U O p B s N 3 2 1 5 4 a f l 54

تمرین واحد کنترل MultiCycle را بصورت میکروپروگرام پیاده سازی کنید.

مزایا و معایب معماری Multicycle
زمان یک کلاک براساس طولانی ترین مرحله تعیین میشود( و نه طولانی ترین دستورالعمل). درنتیجه از کلاک بطور موثرتری استفاده میشود. این امکان بوجود می آید که در طول یک دستوراز یک بلوک سخت افزاری در کلاک های مختلف استفاده نمود. نیاز به تعدادی رجیستر داخلی، تعداد بیشتری مالتی پلکسر، وروش کنترل پیچیده تری دارد. Clk Cycle 1 IFetch Dec Exec Mem WB Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type

مقایسه زمان بندی تک سیکل و چند سیکل
Clk Single Cycle Implementation: lw sw Waste Cycle 1 Cycle 2 multicycle clock slower than 1/5th of single cycle clock due to state register overhead Clk Cycle 1 Multiple Cycle Implementation: IFetch Dec Exec Mem WB Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type Here are the timing diagrams showing the differences between the single cycle and multiple cycle. In the multiple clock cycle implementation, we cannot start executing the store until Cycle 6 because we must wait for the load instruction to complete. Similarly, we cannot start the execution of the R-type instruction until the store instruction has completed its execution in Cycle 9. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation.

فصل بعد MIPS pipelined datapath review
Reading assignment – PH, Chapter

معماری & کتاب Patterson & Henessi

Similar presentations

Presentation on theme: "معماری & کتاب Patterson & Henessi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

معماری & کتاب Patterson & Henessi

Similar presentations

Presentation on theme: "معماری & کتاب Patterson & Henessi"— Presentation transcript:

Similar presentations

About project

Feedback