ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed as a finite state machine (FSM).
Clocking scheme Data movement is controlled by passing the data alternately through latches which are open during phase 1 and latches which are open during phase 2. no race conditions
ARM datapath timing (3-stage pipeline). Note how, though the data passes through the ALU input latches, these do not affect the datapath timing since they are open when valid data arrives. This property of transparent latches is exploited in many places in the design of the ARM to ensure that clocks do not slow critical signals.
Arithmetic operations??? Clocking scheme The minimum data path cycle time is therefore the sum of: the register read time; the shifter delay; the ALU delay; the register write set-up time; the phase 2 to phase 1 non-overlap time. Logical operations???? Arithmetic operations???
Adder design 1 ripple-carry adder CMOS AND-OR-INVERT AND/OR logic worst-case carry path is 32 gates long. In order to allow a higher clock rate, ARM2 used a 4-bit carry look-ahead scheme
Adder design 2 to reduce the worst-case carry path length. The logic produces carry generate (G) and propagate (P) signals which control the 4-bit carry-out. The carry propagate path length is reduced to eight gate delays, again using merged AND-OR-INVERT gates and alternating AND/OR logic.
The ARM2 ALU logic for one result bit.
The ARM6 carry-select adder
ARM6 ALU structure
ARM high-speed multiplier organization• Older ARM cores include low-cost multiplication hardware that supports only the 32-bit result multiply and multiply-accumulate instructions. Recent ARM cores have high-performance multiplication hardware and support the 64-bit result multiply and multiply-accumulate instructions.
The register bank
The register bank
ARM DATA PATH
ARM CONTROL STRUCTURES