NISC set computer no-instruction by: saleh shakhsi khazeni
Contents: what is NISC? Why NISC? Implementing methods what is NISC? Why NISC? Implementing methods Software implementation Hardware implementation ASIP implementation NISC implementation case study :Designing a custom hardware for DCT NISC benefits references
What is NISC ?
Why NISC? 7 times speedup, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a general purpose CPU
Implementing methods Software implementation Hardware implementation ASIP implementation NISC implementation
Software implementation General purpose CPUs Flexible (+) Low cost (+) Short time-to-market (+) Low performance (-) High energy consumption (-)
Hardware implementation Application Specific Integrated Circuits (ASICs) Not Flexible (-) High cost (-) Long time-to-market (-) High performance (+) Low energy consumption (+)
ASIP implementation Application Specific Instruction set Processors (ASIPs) One ALU and some custom function units on a CPU Needing a compiler to generate custom instructions Needing a decoder to decode custom instructions
General Overview of ASIP Architecture 400680 subiu $25,$25,1 400688 lbu $13,0($7) 400690 lbu $2,0($4) 400698 sll $2,$2,0x18 4006a0 sra $14,$2,0x18 4006a8 addiu $4,$4,1 4006b0 srl $8,$2,0x1c 4006b8 sll $2,$8,0x2 4006c0 addu $2,$2,$25 4006c8 lw $2,0($2) 4006d0 xori $13,$13,1 4006d8 addu $10,$10,$2 400680 subiu $25,$25,1 4006a0 sra $14,$2,0x18 4006e0 bgez $10,4006f0 . Register File ID/EXE Reg CFU ALU MUX EXE/MEM Reg GPP Augmented HW GPP: General Purpose Processor CFU: Custom Functional Unit
NISC implementation NISC compiler Generate controller and control words Using C code and given datapath
NISC scalability
case study :Designing a custom hardware for DCT The Discrete Cosine Transform (DCT) and Inverse Discrete Cosine Transform (IDCT) are important parts of JPEG and MPEG standards Its algorithm contains two for loop and Add, And, Multiply and Not-equal (!=) operations Simple General Purpose Datapath: (GPD)
case study :Designing a custom hardware for DCT Optimized NISC implementation needs some transformations: Software transformations: two for loops can be merged to one by combining the loops’ counters
case study :Designing a custom hardware for DCT Initial Custom datapath: CDCT1 operation chaining: reduces RF file accesses improves the energy consumption and performance
case study :Designing a custom hardware for DCT CDCT2: Bus customization replace all the global buses, with point to point Connections adding a pipeline register to the datapath
case study :Designing a custom hardware for DCT CDCT3: simplify the ALU and comparator Eliminating the unused parts of ALU, comparator and RF CDCT4 and CDCT5: Controller pipelining Adding CW and status registers
case study :Designing a custom hardware for DCT CDCT6: bit-width reduction Because the address-calculation pipeline stage does not need the 16-bit operations , the bit width of RF, OR, ALU, and Comp can be reduced to 8 bits
case study :Designing a custom hardware for DCT Comparing performance, power, energy and area of the NISCs
case study :Designing a custom hardware for DCT total power consumption In CDCT4, the power consumption increases, because of: (1) higher clock frequency and higher number of pipeline registers; (2) the higher logic power due to CW register gates;
case study :Designing a custom hardware for DCT execution time, power, energy and area of the designs
NISC benefits : Easy for hardware description using C code Eliminating the complexity of controller design Better performance Lower power Less area High speed up by more pipelining
references: 1. NISC Technology home page, www.ics.uci.edu/~nisc 1. NISC Technology home page, www.ics.uci.edu/~nisc 2. Daniel D. Gajski, “NISC: The Ultimate Reconfigurable Component”, CECS Technical Report TR 03-28 3. M. Reshadi, B. Gorjiara, D. Gajski, "NISC Technology and Preliminary Results", CECS Technical Report 05-11, August 2005 4. Mehrdad Reshadi and Daniel Gajski, “NISC Modeling and Compilation” , CECS Technical Report 04-33,December 2004