Download presentation
Presentation is loading. Please wait.
1
RISC-V Physical Design Implementation for High Performance Applications: Challenges and Solutions
Gopakumar.G Hardware Design Group Centre for Development of Advanced Computing (C-DAC)
2
Agenda Traditional Physical Design flow for SoCs
High Performance Implementation Challenges for RISC-V core Goals of traditional Clock Tree Synthesis (CTS) Static Timing Analysis (STA) of Post-CTS Netlist STA using On Chip Variation derating factor Challenges in Pre-CTS/ Post-CTS Clock Timing Gap Closure Clock Concurrent Optimization Recommended Physical design flow for RISC-V high performance implementation Results & Conclusion
3
Traditional Physical Design for SoCs: Flow Diagram
Pre CTS stages aims to optimize the datapath logic for the required design constraints by using an ideal synchronous reference signal termed as clock CTS aims to build a clock network by maintaining tight skew margins, to reduce the timing gap between the ideal clocks (Pre-CTS clocks) and the propagated clocks (Post-CTS clocks) Post CTS optimization and analysis during physical design are based on the propagated clocks
4
RISC-V Implementation for High Performance :Challenges
One setup and one hold constraint are required for every pair of flip-flops in a design that has at least one functional logic path between them RISC-V implementation for frequency of operation in the ranges of few GHz requires migration to advanced semiconductor technology nodes such as 28nm/32nm Deep submicron technology nodes have higher On Chip Variations (OCV) resulting in complex timing analysis at various corner cases The biggest physical design challenges in advanced deep submicron technologies is the clock timing gap closure between the ideal and propagated clocks for optimal performance parameters
5
Clock Tree Synthesis: Goals
Pre CTS : (Ideal Case) : P1, P2 = 0 CTS : Optimize [max (P1,P2) – min (P1,P2)] P1, P2 : clock path delays DP1 : Datapath delay
6
Post CTS Timing Analysis : Effect of propagated clock
Setup: B1max+B2max+FF1max+DP1max+TS ≤ B1min+B3min+Tclk Hold : B1min+B2min+FF1min+DP1min ≥B1max+B3max+TH B1, B2, B3 : buffer delays DP1 : Datapath delay TS , TH : Setup, Hold time of Flip Flop Tclk : Clock period
7
On Chip Variation: Goals
Setup: A + 1.1(B+DP1)+TS ≤ A+ 0.9C+Tclk Hold : A +0.9(B+ DP1) ≥ A+1.1C+TH A, B, C : Path Delays due to clock buffers DP1 : Datapath delay TS , TH : Setup, Hold time of Flip Flop Tclk : Clock period
8
Clock Timing Gap Closure: Challenges
The variation clock timing gap is primarily due OCV : Due to lithographic challenges of fabricating features smaller than the light wavelength. The effect of OCV is more predominant in the divergent paths of clock Clock gating logic : Causes clock divergence Complex clock structures : Causes clock divergence (eg: Clock Muxes, XORs, Generated clock)
9
Clock Concurrent Optimization: Design Flow Diagram
Traditional physical optimization considers datapath delay as a variable parameter while the clock latency for launch flip-flop and capture flip-flop as fixed parameters Clock Concurrent Optimization considers datapath delay, launch clock latency and capture clock latency as variable parameter during optimization
10
RISC-V Implementation : Recommended Design Flow
Physical synthesis enables better correlation between pre and post placed netlist for fastest timing closure with minimum routing congestion CCOpt simultaneously optimize the datapath delays and clock latencies which can yield a design meeting stringent margins for speed, power and area An optimized clock tree structure will also helps in reducing the IR drop in clock path which will further aid to achieve higher frequency of operation.
11
Power Consumption (W)*
Results & Conclusion Comparison of a 11 stage pipelined, Out of Order 64-bit RISC-V processor implementation without CCOpt against an implementation with CCOpt in UMC 90nm High Speed Process yielded the following result Clock Concurrent Optimization can be an effective design technique that can help to boost the design performance by roughly around 10% without much power overhead With Out CCOpt With CCOpt Frequency (MHz) 150 No. of failing paths 722 24 Power Consumption (W)* 1.832 1.886 TNS (ns) -1.94 -0.235 * Power consumption is based on the default static switching factor
12
References Andrew Waterman, The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.1 Paul Cunningham , “Clock Concurrent Optimization :Rethinking Timing Optimization to Target Clocks and Logic at the Same Time” Charles J. Alpert, "Techniques for Fast Physical Synthesis," Proceedings of the IEEE, Vol. 95, No. 3, March 2007, pp J. P. Fishburn, “Clock Skew Optimization,” IEEE Trans. on Computers, vol. 39, pp. 945–951, 1990 C. Leiserson and J. Saxe, “Optimizing Synchronous Systems,” Journal of VLSI and Computer Systems, vol. 1, pp. 41–67, January 1983. J. G. Xi and W. W.-M. Dai, “Useful-Skew Clock Routing with Gate Sizing for Low-Power Design,” Journal of VLSI and Signal Processing Systems, vol. 16, no. 2-3, pp. 163–179, 1997 J. G. Xi and D. Staepelaere, “Using Clock Skew as a Tool to Achieve Optimal Timing,” Integrated System Design, April 1999.
13
Bio Myself, Gopakumar.G serving as senior engineer in Centre for Development of Advanced Computing (C-DAC) since January I have got more than 10 years of experience in ASIC Physical Design in deep submicron technologies and UVM based Verification IP development. I have successfully completed two multimillion gate ASICs tape-outs in 130nm technology node and 4 UVM based verification IP designs in C-DAC. Prior joining C-DAC I have been working as design engineer in Fujitsu ODC under NEST, where I participated in the tape-outs of ADC/DACs in 90nm and 65nm technology nodes. I have filed 4 Indian patents and 1 US patent . I hold a bachelor degree in Electronics and Communication Engineering from University of Kerala.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.