Download presentation
Presentation is loading. Please wait.
1
Stephen Govea Javad Zandazad
EE201C SRAM Simulation Stephen Govea Javad Zandazad
2
Performance Constraints
LTSpice != HSPICE Ignored transistor parameters 100% yield calculations Recalculated performance constraints 155mV nbit voltage at 10ps Evaluate write condition at 6.5ps Nominal performance (10k MC) Vth, 0.1e-6um Leff Read yield: 46% Write yield: 65% SPICE simulation differences prevented us from using the provided performance constraints; using the given performance constraints at the nominal parameter points resulted in a 100% success read/write yield. Instead, we performed a 2000 sample Monte Carlo analysis with the nominal circuit parameters. We captured the results in a pair of histogram plots. The first plot shows the ‘crossing time’, the time at which the nbit voltage exceeds the bit voltage and thus indicates ‘write’ success. The second plot illustrates the distribution of the nbit voltage at 10ps. From these plots we selected new performance constraints: 155mV read constraint and 6.5ps write constraint. MATLAB CalculatePerformanceConstraints(2000) output: ***** RESULTS ***** Mean voltage at 10ps after 2000 simulations (mV): e+002 Voltage mode at 10ps (mV): e+002 Voltage median at 10ps (mV): e+002 Mean crossing time (ps): e+000 Crossing time mode (ps): e+000 Crossing time median (ps): e+000 Elapsed time is seconds.
3
LTSpice Hack -- .DATA Absence of .DATA feature made simulations run slowly (~2 sim/s) Used user-defined functions and counter variable tm Only one equation term is non- zero for each value of tm tm increments from 1 to N Dramatic performance improvement (x5) Necessitated using linear interpolation for output values One of our initial challenges centered on our lack of access to HSPICE and our reliance on LTSpice for circuit simulation. In addition to recalculating the performance constraints to fit our simulation environment we were hindered by the lack of the .DATA functionality present in HSPICE. Our initial simulations ran very slowly, approximately one sample every 0.5s, since every simulation required a separate call to LTSpice from MATLAB. We solved this problem through a novel use of user-defined LTSpice functions. We created four functions with N terms, where N represents the number of simulation samples, and each term represents the requested value for a given parameter at a given sample iteration. We also used the .STEP function create a counter variable from 0 to N; this told LTSpice to run the netlist simulation N times. By adding a unique expression to each term of the user-defined function we were able to selectively pick out the appropriate value at each of the N simulations. The expression combined two unit step expressions: the first unit step function was set to 1 during the desired simulation while the second was set to 1 on the following simulation; by subtracting these and multiplying the result by the remainder of the term we created a condition where only one term for each function was non-zero for a given simulation run.
4
Importance Sampling Method
Goal: focus sampling on the problem regions Implementation: Divide the [-3σ, 3σ] parameter space into 8 regions Uniformly sample the given parameter space Assign failed results based on parameter regions Distribute remaining samples according to the relative number of failures in each region Convert yield/power results to normal distribution Include the uniform sampling yield results Modify the number of uniform and total samples to tune the algorithm
5
Importance Sampling Tuning – Number of Uniform Samples
Monte Carlo baseline 10,000 samples R: 46.44% W: 65.33% Importance Results Shaded superior performance regions Diff equation: diff = abs(read_yield ) + abs(write_yield ); MC simulation at , , 0.1e-6, 0.1e-6, 10k 0.4644, , e-6, e-5, 2.125e-13 6170 seconds Target: 155mV, 6.5ps Narrative: We started by performing a 10k MC sampling at nominal parameter values and calculating the read and write yield performance results; we used these as a baseline case for comparing our QMC Importance results against. Our importance sampling implementation has two key tuning parameters, the first is the number of uniform samples evaluated before sampling the higher failure regions. Here, the graph illustrates the effect of changing the number of uniform samples on the overall yield difference from the baseline case; each iteration used 350 samples total, which we previously identified as the minimum number of samples for the QMC simulation to converge. The two highest performance and somewhat stable regions are highlighted. Surprisingly, a relatively few number of uniform samples are required before the yield results approach the level of accuracy we have seen with the QMC implementation. Also, the error increases dramatically once the number of uniform samples passes ~45. From these results, we tested our importance sampling method using 40 and 16 uniform samples. Overall, the 40 sample case performed rather poorly and did not offer a significant improvement over the QMC-only case. However, the 16 sample case proved somewhat more promising…
6
Importance Sampling Tuning – Overall Number of Samples
Sanity check 16 uniform sample case vs. QMC-only method As a sanity check, we compared the QMC-only method with the QMC Importance implementation with 16 uniform samples. As you can see from the graphs, the yield values track each other closely and the power calculations are virtually identical.
7
Importance Sampling Tuning – Overall Number of Samples
Increase accuracy +2% at 300 Crossing at 250 Reduce samples 35% reduction for comparable accuracy +2% The key
8
Simulated Annealing Method
Randomly pick starting transistor parameters Simulate SRAM performance and calculate score For T attempts: Is current best case good enough? Return to best case after 75% complete Generate new parameter set, simulate and score Decide whether to accept the new parameter set If better than current or stored best case, then accept Possibility of acceptance based on temperature Maintain current, best & suggested parameter sets Return the best performing parameter set
9
Simulated Annealing Tuning
Generate neighbor parameters Used temperature and performance to adjust performance_ratio = max(0.2, 1 - (cur_perform(6) / performance_threshold)); if (cur_temp > 0.25) change_percentage = (0.2 * performance_ratio * cur_temp) ; else change_percentage = (0.15 * performance_ratio * cur_temp) ; end Current Temperature cur_temp = 1 - T / max_num_attempts Calculate Score Evenly weighted read & write yield
10
Simulated Annealing Tuning
Accept Probability accept_probability = exp(suggest_perform(6) - cur_perform(6)) * cur_temp
11
Simulated Annealing Tuning
Parameter Trending Track overall score against parameter trend Maintain a window of size n (=10) of the overall score and of the parameters. Check if the score is more often improving (or getting smaller) in that window. Check also if the parameters are following a certain trend. If the score is improving, favor that parameter trend in the next window. Otherwise try to stay away from the trend.
12
Simulated Annealing Results
Test Read % Write % Score Attempts Total Sim. Comparison Baseline 97.73% 97.47% 97.60% 2,592 1,036,800 - 1 86.33% 91.15% 88.74% 600 180,000 90.9% 23.1% 17.4% 2 87.45% 91.35% 89.40% 91.6% 3 84.64% 93.62% 89.13% 91.3% 4 87.51% 92.02% 89.77% 92.0% 5 87.10% 91.67% 89.39% 200 60,000 7.7% 5.8% 6 87.75% 92.32% 90.04% 92.2% 7 87.95% 92.31% 90.13% 92.3% 8 86.35% 91.74% 89.05% 91.2% Test #1: Random neighbor sampling, 15% allowed range; best parameters found: , , e-08, e-08; QMC Importance testing method, 300 samples at each point; best performance found: , , e-06, e-06, e-13, ; found the “best” case around sample 200. Test #2: Random neighbor sampling, 20% allowed range; best parameters found: e e-08; QMC Importance testing method, 300 samples at each point; best performance found: e e e ; found the “best” case around sample 350. Test #3: Moved to the best parameter values with 20% of the attempts left. Used a combination of the performance ratio and the switch value. The switch cut out the performance as a factor in calculating the neighboring values after 5 failed attempts to find new neighbor values with a better combination. The performance ratio also had a 0.25 limit so that it could not restrict the size of the neighbor space too much even when the performance was quite high. Overall percentage was 20% plus a 2.5% offset to ensure that the change percentage did not drop to zero. Used 600 attempts. Best performance: e e e ; best parameters: e e-08 Test #4: Moved to the best parameter values with 20% of the attempts left. Used a combination of the performance ratio and the switch value. The switch cut out the performance as a factor in calculating the neighboring values after 5 failed attempts to find new neighbor values with a better combination. The performance ratio also had a 0.25 limit so that it could not restrict the size of the neighbor space too much even when the performance was quite high. Overall percentage was 15% plus a 2.5% offset to ensure that the change percentage did not drop to zero. Used 600 attempts. Best performance: e e e ; best parameters found: e e-07 Test #5: Moved to the best parameter values with 20% of the attempts left. Used a combination of the performance ratio and the current temperature to tune the neighbor selection. Performance ratio has a 20% cut-out limit; 20% overall range used plus 5% baseline. 200 overall attempts. Best performance: e e e Best parameters: e e-08 Test #6: Just like test #5, but a 15% overall sampling plus 5% baseline. Best performance: e e e Best parameters: e e-08 Test #7: Test #8: Decreased the go back to best point to 75% of the way done with the attempts and introduced a new neighbor function during the go-back and after period that reduces the overall percentage to 10% with a 2.5% baseline.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.