Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sridhar Rajagopal COMP 625 April 17, 2000

Similar presentations


Presentation on theme: "Sridhar Rajagopal COMP 625 April 17, 2000"— Presentation transcript:

1 Sridhar Rajagopal COMP 625 April 17, 2000
Wireless Communication Extensions for DSPs and General Purpose Processors Sridhar Rajagopal COMP 625 April 17, 2000

2 Motivation Wireless, the next wave after Multimedia
Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market Sridhar Rajagopal April 17,2000

3 Outline Processor Core with Reconfigurable Support
Permutation Based Interleaved Memory Processor Architecture -EPIC Instruction Set Extensions Truncated Multipliers Software Support Needed Sridhar Rajagopal April 17,2000

4 Characteristics of Wireless Algorithms
Massive Parallelism Bit-level Computations Matrix Based Operations Memory Intensive Complex-valued Data Approximate Computations Sridhar Rajagopal April 17,2000

5 What’s wrong with Current Architectures for these applications?
Sridhar Rajagopal April 17,2000

6 Problems with Current Architectures
UltraSPARC, C6x, MMX, IA-64 Not enough MIPs/FLOPs Unable to fully exploit parallelism Bit Level Computations Memory Bottlenecks Specialized Instructions for Wireless Communications Sridhar Rajagopal April 17,2000

7 Why Reconfigurable Adapt algorithms to environment
Seamless and Continuous Data Processing during Handoffs Home Area Wireless LAN High Speed Office Wireless LAN Outdoor CDMA Cellular Network Sridhar Rajagopal April 17,2000

8 Reconfigurable Support
User Interface Translation Synchronization Transport Network OSI Layers 3-7 Data Link Layer (Converts Frames to Bits) Layer 2 Physical Layer (hardware; raw bit stream) 1 Sridhar Rajagopal April 17,2000

9 Different Protocols MPEG-4, H.723 - Voice,Multimedia
Convolutional,Turbo - Channel Coding Source Coding Channel Coding Source Decoding Channel Decoding Multiuser Detection Channel Estimation Sridhar Rajagopal April 17,2000

10 Network Interface Card
A New Architecture Main Memory Processor Core (GPP/DSP) Cache Q Q Crossbar Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Add-on PCMCIA Network Interface Card Processor Sridhar Rajagopal April 17,2000

11 Why Reconfigurable Process initial bit level computations
Optimize for fast I/O transfer Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Sridhar Rajagopal April 17,2000

12 Reconfigurable Support
2 64-bit data buses 1 64-bit address bus Control Blocks Boolean values Fast I/O Configuration Caches 64-bit Datapath Sequencer GARP Architecture at UC,Berkeley Sridhar Rajagopal April 17,2000

13 Reconfigurable Support
Wide Path to Memory Data Transfer Minimize Load Times Configuration Caches Recently Displaced Configurations(5 cycles) Can hold 4 full size Configurations Independent Execution Sridhar Rajagopal April 17,2000

14 Reconfigurable Support
Access to same Memory System as Processor Minimize overhead When idle Load Configurations Transfer Data Sridhar Rajagopal April 17,2000

15 Operation Load Configuration
If in configuration cache, minimal time Copy initial data with coprocessor move instructions Start execution Issue wait that interlocks while active Copy registers back at kernel completion Sridhar Rajagopal April 17,2000

16 Memory Interface Access to Main Memory and L1 Data Cache
Large, fast Memory Store Memory Prefetch Queues for Sequential Accesses Read aheads and Write Behinds Processor Core (GPP/DSP) L1 Data Cache Q Crossbar Main Memory FPGA Instruction Cache Sridhar Rajagopal April 17,2000

17 Permutation Based Interleaved Memory (PBI)
High Memory Bandwidth Needed Stride-Insensitive Memory System for Matrices Multiple Banks Sustained Peak Throughput (95%) L1 Data Cache Main Memory Sridhar Rajagopal April 17,2000

18 PBI Scheme N- address length M = 2n Banks 2N-n words in each bank
To access a word, n-bit bank number N-n bit address (high-order) Calculation of the n-bit Bank Number Sridhar Rajagopal April 17,2000

19 Calculate Bank Number Use all N bits to get n-bit vector
Y = A X , A = n*N matrix of 0’s & 1’s Y = AhXh + Al Xl (N-n,n) [Al -rank n] N-bit parity circuit with logkN levels of XOR gates (k-Fanin) Parity Ckt. Row 0 of A Row 1 of A Row n-1 of A N-bit address Decoder n parity bit signals 2n bank select signals Sridhar Rajagopal April 17,2000

20 Interleaved Memory Model
Input Buffers Address Source Memory Banks M(0) M(1) M(M-1) Data Sink Data Sequencer Output Buffers Sridhar Rajagopal April 17,2000

21 Processor Core 64-bit EPIC Architecture with Extensions(IA-64/C6x)
Statically determined Parallelism;exploit ILP Execution Time Predictability Processor Core (GPP/DSP) Cache Q Crossbar FPGA Sridhar Rajagopal April 17,2000

22 EPIC Principle Explicitly Parallel Instruction Computing
Evolution of VLIW Computing Compiler- Key role Architecture to assist Compiler Better cope with dynamic factors which limited VLIW Parallelism Sridhar Rajagopal April 17,2000

23 Aspects of EPIC Designing Plan of Execution(POE) at Compile Time
Permitting Compiler to play Statistics Conditional Branches, Memory references Communicating POE to the hardware Static Scheduling Branch information Sridhar Rajagopal April 17,2000

24 Architecture Features in EPIC
Static Scheduling MultiOP Non-Unit Assumed Latency (NUAL) The Branch Problem Predicated Execution Control Speculation Predicated Code Motion The Memory Problem Cache Specifiers Data Speculation Sridhar Rajagopal April 17,2000

25 Instruction Set Extensions
To accelerate Bit level computations in Wireless Real/Complex Integer - Bit Multiplications Used in Multiuser Detection, Decoding Bit - Bit Multiplications Used in Outer Product Updates Correlation, Channel Estimation Complex Integer-Integer Multiplications Useful in other Signal Processing applications Speech, Video,,, Sridhar Rajagopal April 17,2000

26 Architecture Support Support via Instruction Set Extensions
Minimal ALU Modifications necessary Transparent to Register Files/Memory Additional 8-bit Special Purpose Registers Sridhar Rajagopal April 17,2000

27 Integer - Bit Multiplications
D[I] = D[I] + b[J]*C[j] Eg: Cross-Correlation 64-bit Register C 64-bit Register A +/- +/- +/- 8-bit Register b 64-bit Register D Register Renaming? Sridhar Rajagopal April 17,2000

28 8-bit to 64-bit conversions
1.1 1.2 D = D + b*bT Eg: Auto-Correlation 2.1 b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) 8-bit Register b 64-bit Register A b(1)..b(8) b(1) b(2) b(8) b(7) b(1)..b(8) b(1) b(1) b(8) b(8) Sridhar Rajagopal April 17,2000

29 Bit-Bit Multiplications
D = D + b*bT Eg: Auto-Correlation b1*b2 Bit-Bit Multiplications 64-bit Register A = b1 64-bit Register B=b2 Ex-NOR 64-bit Register C=b1*b2 Sridhar Rajagopal April 17,2000

30 Increment/Decrement D = D + b*bT Eg: Auto-Correlation
64-bit Register D 1 +/- +/- +/- 8-bit Register b1*b2 64-bit Register (D+b1*b2) Sridhar Rajagopal April 17,2000

31 Complex-valued Data Processing
Is it easy to add ? Is this worth an additional ALU Support ? Typically supported by Software! ? Sridhar Rajagopal April 17,2000

32 Truncated Multipliers
Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Truncated Multipliers - half the area/half the delay Can do 2 truncated multiplies in parallel with regular ALU Multipliers Truncated Multiplier Multiplier 1 Multiplier 2 Sridhar Rajagopal April 17,2000

33 Software Support Greater Interaction between Compilers and Architectures EPIC Reconfigurable Logic Compiler needs to find and exploit bit level computations Reconfigurable Logic Programming Sridhar Rajagopal April 17,2000

34 Area Estimates Area increase by 20% over a IA-64 architecture size due to reconfigurable Support Instruction Set extensions need min hardware support Parallel Interleaved Memory Banks will need larger area Sridhar Rajagopal April 17,2000

35 Other Uses Reconfigurable Logic Bit Level Support
For accelerating loops of general purpose processors Bit Level Support For other voice, video and multimedia applications Sridhar Rajagopal April 17,2000

36 Conclusions Processor Core with Reconfigurable Support developed for Wireless Applications Instruction Set Extensions added for accelerating performance of the algorithms Integration of Wireless Appliances with General Purpose Processors Great Impact on Performance of Wireless Algorithms Sridhar Rajagopal April 17,2000

37 Future Work Simulations for finding performance improvements
Other Processor Architectures Bit Slice Architectures Out-of-order Sridhar Rajagopal April 17,2000

38 References The GARP Architecture and C Compiler
T.C. Callahan,J.R.Hauser,J.Wawrzynek, IEEE Computer,April 2000, pp62-69 EPIC:Explicitly Parallel Instruction Computing M.S.Schlansker,B.R.Rau, IEEE Computer, Feb 2000, pp 37-45 High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study G.S.Sohi, IEEE Transactions on Computers, Vol.42,No.1,Jan 1993,pp34-44 Sridhar Rajagopal April 17,2000

39 Acknowledgements Vijay Pai Partha Ranganathan Joseph Cavallaro
Sridhar Rajagopal April 17,2000


Download ppt "Sridhar Rajagopal COMP 625 April 17, 2000"

Similar presentations


Ads by Google