Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.

Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation

Motivation Start from a simple processor core Find new macro instructions to enhance performance and reduce code size Application-specific Using dedicated hardware to speed up I/D Mem. ALU Reg Bus unit control Application Macro Instr. Ext. Control Reg/Mem Access

RISC8 Architecture Why RISC8? Simple 8-bit ISA with 43 Instructions Addressable space 64K bytes Complete ISA, including Load/Store, Arithmetic, Logical, Branch, Multiplication,Division, Stack Operation, Subroutine call, Interrupt Operations, etc. Small Verilog core size is 3.5K gates in 0.25um clock speed of 300MHz is reported (our result is about 200MHz) Synthesizable RTL Core Free assembler

Methodology Application (*.c) IR (exp. tree) Front end Code Gen. Asm. code simulation mach. code Assembler performance Instr. Profiling RTL exp. tree Istr. Syn

Different Levels of expression trees sum += c & 5 ASSIGN ADD VAR AND VAR CON ASSIGN ADD bytecon08 reg acc reg AND accreg byte SUIF IRRTL IR after code gen ASSIGN ADD VARcon08 AND addr16 MOV VAR addr16 Reconstructed from mach. code

Expression trees SUIF IR Data type carried Inaccurate cost No profiling Simple – less tree nodes Machine independent Register level Data type carried One-to-one between macro instructions Profiling data can be back annotated Machine dependent Machine code Data type lost One-to-one between machine instructions Profiling data accurate Large expression trees Machine dependent

Instruction Enumeration Traverse tree structure in post-order Normalize sub-tree orders Combine patterns from sub-trees Hash new instruction patterns Collect register usage and memory access for evaluation Annotate profiling information ADD bytecon08 acc reg AND accreg byte

Machine Code Level Tree Reconstruction Build IR tree from machine codes Recover data dependencies from assembly code Clear definition by ISA eg. AND r2 ==> acc=acc & r2 Limited to a basic block Eliminate intermediate storage nodes ADD bytecon08 acc reg AND accreg byte

Machine Code Level Tree Reconstruction ADD bytecon08 AND byte Build IR tree from machine codes Recover data dependencies from assembly code Clear definition by ISA eg. AND r2 ==> acc=acc & r2 Limited to a basic block Eliminate intermediate storage nodes

Special Instr. Table-Driven Assembly Development Tools Asm. code mach. code Assembler performance Instr. Profile Disassembler Simulator New Instr. Select Instr. Table New Instruction Candidates Asm. code Istr. Syn

Table-driven back-end tool automation @new_ins=( 'mac'=>{otree=>['r0','nADD','r0',['nMUL','Rn','addr16']], pattern=>'Rn addr16', code=>['00000011','00000$Rn','$addr16[0]','$addr16[1]'], sim=>'$R[0]+=$R[$Rn]*$memory[$addr16]', cycles=>13, decode=>'$Rn=$memory[$pc++] & 0x7; $addr16[0]=$memory[$pc++]; $addr16[1]=$memory[$pc++]; $addr16=$addr16[0]|($addr16[1]<<8);‘ });

Op-Code Reuse Op codes may not be fully used in a specific application Remove un-used instruction op-codes Typical applications use far less than 256 op-codes Cost of op-code reuse Decoding logic Less flexibility applicationFIRADPCMGSMmax7219LCD4x20PRN-IO Opcodes284932394030

Implementation Compiler front-end: SUIF Code generator: SPAM-olive Retargeted to RISC8 RTL pattern enumeration: C++ RISC8 assembler: PERL RISC8 simulator: PERL Machine level pattern enumeration: PERL Macro driven instruction implementation automation: PERL

Benchmarks BenchmarkInstructions# adpcm null:nASSIGN(word,nAND(areg,const16)) null:nASSIGN(word,nADD(areg,word)) bool:nBOOL(areg,const16) bool:nBOOL(nAND(areg,const16),areg) areg:nIOR(nAND(areg,const16),word) 40 40 86 36 24 GSM-encoder acc:nAND(acc,const08) acc:nAND(nASR(acc,const08),reg) acc:nIOR(nAND(acc,const08),reg) acc:nASR(acc,const08) acc:nIOR(nAND(nASR(acc,const08),const08),reg) 796 492 414 330 621 PRN-IO acc:nIOR(acc,const08) null:nASSIGN(byte,nIOR(acc,reg)) null:nASSIGN(byte,nIOR(acc,const08)) bool:nBOOL(nAND(areg,const16),const16) 240 96 96 60 LCD_4X20 bool:nBOOL(acc,const08) null:nASSIGN(byte,nADD(reg,one)) 99 30 max7219 Acc:nIOR(acc,const08) bool:nBOOL(nAND(acc,reg),zero) 140 48

GSM encoder Hardware/software tradeoff Software gain: execution speed, code size Hardware cost: functional unit, decoding logic, data path configuration

Conclusions RTL level pattern enumeration Key to automating instruction identification, code-generation, assembly and simulation No need to change algorithm source code Hardware/software trade-off Good estimation of performance gain and hardware cost at register-transfer level Op-code reuse

Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.

Similar presentations

Presentation on theme: "Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.

Similar presentations

Presentation on theme: "Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation."— Presentation transcript:

Similar presentations

About project

Feedback