Sridhar Rajagopal COMP 625 April 17, 2000

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
The University of Adelaide, School of Computer Science
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This.
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,
Use of Pipelining to Achieve CPI < 1
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Computer Organization and Architecture Lecture 1 : Introduction
CS 352H: Computer Systems Architecture
COMP 740: Computer Architecture and Implementation
Advanced Architectures
Low-power Digital Signal Processing for Mobile Phone chipsets
William Stallings Computer Organization and Architecture 8th Edition
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
A programmable communications processor for future wireless systems
Morgan Kaufmann Publishers
Embedded Systems Design
Sridhar Rajagopal April 26, 2000
Architecture & Organization 1
5.2 Eleven Advanced Optimizations of Cache Performance
CS203 – Advanced Computer Architecture
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Cache Memory Presentation I
Computer Architecture and Organization
Vector Processing => Multimedia
COMP4211 : Advance Computer Architecture
Flow Path Model of Superscalars
Digital Signal Processors
Improving cache performance of MPEG video codec
Pipelining: Advanced ILP
Parallel and Multiprocessor Architectures
Instruction Level Parallelism and Superscalar Processors
Architecture & Organization 1
Yingmin Li Ting Yan Qi Zhao
Ka-Ming Keung Swamy D Ponpandi
DSPs for Future Wireless Base-Stations
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Advanced Computer Architecture
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
* From AMD 1996 Publication #18522 Revision E
Computer Evolution and Performance
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
Sridhar Rajagopal, Srikrishna Bhashyam,
CSE 502: Computer Architecture
DSP Architectures for Future Wireless Base-Stations
ADSP 21065L.
Suman Das, Sridhar Rajagopal, Chaitali Sengupta and Joseph R.Cavallaro
Ka-Ming Keung Swamy D Ponpandi
DSPs for Future Wireless Base-Stations
Presentation transcript:

Sridhar Rajagopal COMP 625 April 17, 2000 Wireless Communication Extensions for DSPs and General Purpose Processors Sridhar Rajagopal COMP 625 April 17, 2000

Motivation Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market Sridhar Rajagopal April 17,2000

Outline Processor Core with Reconfigurable Support Permutation Based Interleaved Memory Processor Architecture -EPIC Instruction Set Extensions Truncated Multipliers Software Support Needed Sridhar Rajagopal April 17,2000

Characteristics of Wireless Algorithms Massive Parallelism Bit-level Computations Matrix Based Operations Memory Intensive Complex-valued Data Approximate Computations Sridhar Rajagopal April 17,2000

What’s wrong with Current Architectures for these applications? Sridhar Rajagopal April 17,2000

Problems with Current Architectures UltraSPARC, C6x, MMX, IA-64 Not enough MIPs/FLOPs Unable to fully exploit parallelism Bit Level Computations Memory Bottlenecks Specialized Instructions for Wireless Communications Sridhar Rajagopal April 17,2000

Why Reconfigurable Adapt algorithms to environment Seamless and Continuous Data Processing during Handoffs Home Area Wireless LAN High Speed Office Wireless LAN Outdoor CDMA Cellular Network Sridhar Rajagopal April 17,2000

Reconfigurable Support User Interface Translation Synchronization Transport Network OSI Layers 3-7 Data Link Layer (Converts Frames to Bits) Layer 2 Physical Layer (hardware; raw bit stream) 1 Sridhar Rajagopal April 17,2000

Different Protocols MPEG-4, H.723 - Voice,Multimedia Convolutional,Turbo - Channel Coding Source Coding Channel Coding Source Decoding Channel Decoding Multiuser Detection Channel Estimation Sridhar Rajagopal April 17,2000

Network Interface Card A New Architecture Main Memory Processor Core (GPP/DSP) Cache Q Q Crossbar Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Add-on PCMCIA Network Interface Card Processor Sridhar Rajagopal April 17,2000

Why Reconfigurable Process initial bit level computations Optimize for fast I/O transfer Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Sridhar Rajagopal April 17,2000

Reconfigurable Support 2 64-bit data buses 1 64-bit address bus Control Blocks Boolean values Fast I/O Configuration Caches 64-bit Datapath Sequencer GARP Architecture at UC,Berkeley Sridhar Rajagopal April 17,2000

Reconfigurable Support Wide Path to Memory Data Transfer Minimize Load Times Configuration Caches Recently Displaced Configurations(5 cycles) Can hold 4 full size Configurations Independent Execution Sridhar Rajagopal April 17,2000

Reconfigurable Support Access to same Memory System as Processor Minimize overhead When idle Load Configurations Transfer Data Sridhar Rajagopal April 17,2000

Operation Load Configuration If in configuration cache, minimal time Copy initial data with coprocessor move instructions Start execution Issue wait that interlocks while active Copy registers back at kernel completion Sridhar Rajagopal April 17,2000

Memory Interface Access to Main Memory and L1 Data Cache Large, fast Memory Store Memory Prefetch Queues for Sequential Accesses Read aheads and Write Behinds Processor Core (GPP/DSP) L1 Data Cache Q Crossbar Main Memory FPGA Instruction Cache Sridhar Rajagopal April 17,2000

Permutation Based Interleaved Memory (PBI) High Memory Bandwidth Needed Stride-Insensitive Memory System for Matrices Multiple Banks Sustained Peak Throughput (95%) L1 Data Cache Main Memory Sridhar Rajagopal April 17,2000

PBI Scheme N- address length M = 2n Banks 2N-n words in each bank To access a word, n-bit bank number N-n bit address (high-order) Calculation of the n-bit Bank Number Sridhar Rajagopal April 17,2000

Calculate Bank Number Use all N bits to get n-bit vector Y = A X , A = n*N matrix of 0’s & 1’s Y = AhXh + Al Xl (N-n,n) [Al -rank n] N-bit parity circuit with logkN levels of XOR gates (k-Fanin) Parity Ckt. Row 0 of A Row 1 of A Row n-1 of A N-bit address Decoder n parity bit signals 2n bank select signals Sridhar Rajagopal April 17,2000

Interleaved Memory Model Input Buffers Address Source Memory Banks M(0) M(1) M(M-1) Data Sink Data Sequencer Output Buffers Sridhar Rajagopal April 17,2000

Processor Core 64-bit EPIC Architecture with Extensions(IA-64/C6x) Statically determined Parallelism;exploit ILP Execution Time Predictability Processor Core (GPP/DSP) Cache Q Crossbar FPGA Sridhar Rajagopal April 17,2000

EPIC Principle Explicitly Parallel Instruction Computing Evolution of VLIW Computing Compiler- Key role Architecture to assist Compiler Better cope with dynamic factors which limited VLIW Parallelism Sridhar Rajagopal April 17,2000

Aspects of EPIC Designing Plan of Execution(POE) at Compile Time Permitting Compiler to play Statistics Conditional Branches, Memory references Communicating POE to the hardware Static Scheduling Branch information Sridhar Rajagopal April 17,2000

Architecture Features in EPIC Static Scheduling MultiOP Non-Unit Assumed Latency (NUAL) The Branch Problem Predicated Execution Control Speculation Predicated Code Motion The Memory Problem Cache Specifiers Data Speculation Sridhar Rajagopal April 17,2000

Instruction Set Extensions To accelerate Bit level computations in Wireless Real/Complex Integer - Bit Multiplications Used in Multiuser Detection, Decoding Bit - Bit Multiplications Used in Outer Product Updates Correlation, Channel Estimation Complex Integer-Integer Multiplications Useful in other Signal Processing applications Speech, Video,,, Sridhar Rajagopal April 17,2000

Architecture Support Support via Instruction Set Extensions Minimal ALU Modifications necessary Transparent to Register Files/Memory Additional 8-bit Special Purpose Registers Sridhar Rajagopal April 17,2000

Integer - Bit Multiplications D[I] = D[I] + b[J]*C[j] Eg: Cross-Correlation 64-bit Register C 64-bit Register A +/- +/- +/- 8-bit Register b 64-bit Register D Register Renaming? Sridhar Rajagopal April 17,2000

8-bit to 64-bit conversions 1.1 1.2 D = D + b*bT Eg: Auto-Correlation 2.1 b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) 8-bit Register b 64-bit Register A b(1)..b(8) b(1) b(2) b(8) b(7) b(1)..b(8) b(1) b(1) b(8) b(8) Sridhar Rajagopal April 17,2000

Bit-Bit Multiplications D = D + b*bT Eg: Auto-Correlation b1*b2 Bit-Bit Multiplications 64-bit Register A = b1 64-bit Register B=b2 Ex-NOR 64-bit Register C=b1*b2 Sridhar Rajagopal April 17,2000

Increment/Decrement D = D + b*bT Eg: Auto-Correlation 64-bit Register D 1 +/- +/- +/- 8-bit Register b1*b2 64-bit Register (D+b1*b2) Sridhar Rajagopal April 17,2000

Complex-valued Data Processing Is it easy to add ? Is this worth an additional ALU Support ? Typically supported by Software! ? Sridhar Rajagopal April 17,2000

Truncated Multipliers Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Truncated Multipliers - half the area/half the delay Can do 2 truncated multiplies in parallel with regular ALU Multipliers Truncated Multiplier Multiplier 1 Multiplier 2 Sridhar Rajagopal April 17,2000

Software Support Greater Interaction between Compilers and Architectures EPIC Reconfigurable Logic Compiler needs to find and exploit bit level computations Reconfigurable Logic Programming Sridhar Rajagopal April 17,2000

Area Estimates Area increase by 20% over a IA-64 architecture size due to reconfigurable Support Instruction Set extensions need min hardware support Parallel Interleaved Memory Banks will need larger area Sridhar Rajagopal April 17,2000

Other Uses Reconfigurable Logic Bit Level Support For accelerating loops of general purpose processors Bit Level Support For other voice, video and multimedia applications Sridhar Rajagopal April 17,2000

Conclusions Processor Core with Reconfigurable Support developed for Wireless Applications Instruction Set Extensions added for accelerating performance of the algorithms Integration of Wireless Appliances with General Purpose Processors Great Impact on Performance of Wireless Algorithms Sridhar Rajagopal April 17,2000

Future Work Simulations for finding performance improvements Other Processor Architectures Bit Slice Architectures Out-of-order Sridhar Rajagopal April 17,2000

References The GARP Architecture and C Compiler T.C. Callahan,J.R.Hauser,J.Wawrzynek, IEEE Computer,April 2000, pp62-69 http://brass.cs.berkeley.edu EPIC:Explicitly Parallel Instruction Computing M.S.Schlansker,B.R.Rau, IEEE Computer, Feb 2000, pp 37-45 High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study G.S.Sohi, IEEE Transactions on Computers, Vol.42,No.1,Jan 1993,pp34-44 Sridhar Rajagopal April 17,2000

Acknowledgements Vijay Pai Partha Ranganathan Joseph Cavallaro Sridhar Rajagopal April 17,2000