DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Figure 4.3(a) Block diagram of a barrel shifter
Figure 4.3(b) Implementation of a 4-bit, shift-right barrel shifter
Figure 4.5 A MAC unit with accumulator guard bits
Figure 4.6 A schematic diagram of the saturation logic
Figure 4.7 Block diagram of an arithmetic logic unit
Figure 4.9 Register pointer updating algorithm for circular buffer addressing mode: SAR = start address register contents, EAR = end address register contents, PNTR = pointer
Figure 4.10 Different cases that arise in updating the pointer in circular buffer addressing mode
Figure 4.10 Continued
Figure 4.11 Block diagram of an address generation unit
Bit-reversal Hardware
Figure 4.12 A conceptual diagram of a program sequencer
Instruction Level Parallelism VLIW architecture Each instruction specifies several operations to be done in parallel Advantages : Simple hardware compilers can spot ILP easily Disadvantages : Little compatibilty between generations Explicit NOPs bloat code size
Super scalar architecture Hardware responsible for finding ILP in a sequential program Advantage : Compatibility between generations Disadvantage : Very complex hardware
Explicitly Parallel Instruction Computing (EPIC) Combines VLIW and super scalar architectures Instructions are grouped into 3 operating blocks and a template block Template block tells hardware if instructions can be executed in parallel Also gives information whether the block can be executed in parallel
ILP versus Power Increasing instructions / cycle Requires fewer cycles to execute a task Uses longer clock for same performance Uses lower supply voltage And hence uses less power However, too many functional units and too many transitions per clock cycle increase power consumption.
Low Power architecture Power consumed by additional circuits vs. ability to lower clock rate while maintaining performance Circuits must be highly used Move complexity into software Voltage scaling : Reduce V dd Clock gating : Turn off clock when chip is not in use ( applies to sub-modules of chip also)
VLIW is more suitable than super scalar for low power - VLIW is smaller for same number of functional units - Compiler is better at finding parallelism than hardware Put multiple processors on chip rather than lots of functional units in one processor Helps in running independent tasks
General Purpose Microprocessor 2000 GHz clock speed 32-bit address or more 32-bit bus, 128-bit instructions Complex MMU Super scalar CPU MMX instructions On chip cache Single cycle execution 32-bit floating point ALU on board Very expensive 10s of watts of power
DSP in 2000 Clock 100 ~ 200 MHz 16-bit floating point or 32-bit floating point bits address space Large on-chip and off-chip memories Single cycle execution of most instructions Harvard architecture Lots of special DSP instructions 50 mw to 2w power Cheap
Future of DSP Microprocessor Sufficiently unique for an independent class of applications (HDD, cell phone) Low power consumption, low cost High performance within power, cost constraints (MIPS/mw, MIPS/$) Fixed point & floating point Better compilers - but users must be informed Hybrid DSP/ GP systems