Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen Illinois Center for Wireless Systems Wireless SoC Design Trends and Challenges Shrinking transistor technologies have transformed die into a host of extraordinary size and complexity systems –All the analog and digital components that were implemented in 3-4 different ICs in past technologies, can now fit in a single chip Designer Productivity does not rise at the same rate as transistor capacity –Design reuse and use of Commercial Off-The-Self (COTS) Intellectual Property (IP) help meet Time-To-Market (TTM) constraints but have other downsides Design space exploration is becoming a daunting task and conflicts with the shrinking TTM requirements System customization suffers in terms of functionality/ performance/power/area from “one system fits all” tactic Design focus is shifting from single thread speed optimization to execution parallelization through multi-processor systems Typical Design Practice & Design Paradigm Shift COTS IP modules are integrated to meet the required system functionality –Usually a generic microprocessor/micro- controller is used for the control part and a separate DSP processor for the signal processing part –Fixed-functionality IP modules are integrated for the various data processing IP-use speeds up the design phase but: –imposes coarse granularity on optimization decisions regarding functionality, performance and power dissipation –does not eliminate design time entirely, as interfacing between different IP modules can take up considerable engineering resources Design Paradigm needs a shift to higher abstraction level –Design systems efficiently with higher flexibility and on-demand customization Instruction-less custom processor / accelerator: –Microcode memory stores microcode words which control Functional-Units (FU) and data transfers each cycle –Program Counter (PC) holds next microcode memory address –Microcode words do not require any decoding –FUs customized according to application domain –Application-custom forwarding paths between FUs can eliminate unnecessary Register File (RF) reads/writes EPOS ( Explicitly Parallel Operations System ) Instruction-Level Parallelism (ILP) extraction: –The front-end of the IMPACT compiler is used to optimize the HLL description using: Traditional compiler techniques Superblock and Hyperblock creation The EPOS accelerators generated can substitute the generic COTS IP by: –Offering high customization according to the system requirements –Providing better performance and power efficiency than a generic DSP- core/microprocessor EPOS – based Wireless SoC Solution Each module is mapped directly onto a customized EPOS accelerator The interfaces between the EPOS accelerators, as well as, between other IP and EPOS modules are defined in the HLL program and automatically synthesized along with the EPOS datapaths Exploration of alternative system implementations becomes efficient and extremely fast Each EPOS processor can be re-programmed within the system to execute optimized/modified versions of its original functionality EPOS Performance Results EPOS Configuration used: –4xALU –1xMUL –1xST-Port –1xLD-Port FU Latencies: –ALU: 1 –MUL: 3 –LD: 4 –ST: 1 Application NISC (cycles) EPOS (cycles) startup dijkstra bubble