Memory Segmentation to Exploit Sleep Mode Operation 32nd ACM/IEEE DAC, 1997 Amir H. Farrahi, et.al. CS554 (2006 Fall) Ju-Young Jung
Introduction Background Paper overview Methodologies to obtain the idle set Experimental results Recent researches for memory power saving Leakage power Considerations Conclusion
ABC over Memory Partitioning (MP) The principle of memory partitioning Sub-divide the address space into several smaller blocks, which consumes less energy Map such blocks to different physical memory banks Selectively activate or disable each bank What is the gains from MP? Performance-oriented solution Low-energy
ABC over MP (Cont’d) Pitfalls “Let’s consider only Power” Contemporary embedded system increasingly requires more computation power, and thus low power technique must have little or no effect on performance. “Finer partitioning, better performance” Arbitrary partitioning causes an excessively large # of small banks, and thus area inefficiency. Severe wiring overhead ( wire delay > gate delay) Offset the benefit from MP, and even degraded
Example of Memory Partitioning
Skimming Over This Paper Questions? How should we partition memory to exploit sleep mode? (define this problem and suggest algorithm) How to obtain the information on idle sets? Board-level memory optimization Target for large off-chip DRAMs What is idea? Do NOT refresh DRAM blocks without live data Save refresh energy
Circuit Partitioning for sleep mode
Methodologies to Obtain the Idle Set Idle Sets for Memory Element (ME) M = {m1, m2,…, mr}; the set of dynamic MEs Access pattern for each ME expressed by (ti, Ai) ,where ti : access time / Ai : access type (R or W) No-Refresh Rules 1st access to mi is W @ t prior to t The last access to mi thereafter access to mi @ t followed by W access @ t’ during the interval (t, t’)
Obtaining Idle Sets for ME
Methodologies to Obtain Idle Sets Idle Sets for Clocked Element Assumption : “FUs have registers at input” If FU not used, gate the clock signal to the register No dynamic power consumption Active only during FU assigned to a control step c Otherwise it is idle for this control step
Formulation ME m is idle Interval I1 and I2 are non-overlapping Idle set Nm of m I1 covers I2 Length L(I) of an interval I = (l, r) Intersection of I1 and I2 (I1∧I2) Intersection of N1 and N2 (N1∧N2)
Problem P1 Instance : Ordered quadruple (a,b,c,S) Objective : Determine whether there exists a b-balanced bi-partitioning (S1,S2) such that : G(S1,S2) >= c Theorem : P1 is NP-complete
General Strategies for NP-Complete Rule out the possibility of existence of a polynomial time algorithm for P1 Toward theoretical end Study the complexity of special sub-class of the general problem that potentially solvable in polynomial time Toward practical end Heuristic approaches developed to solve the problem sub-optimally but in polynomial time
Exact Algorithm
Polynomial-Time Sub-Classes Problem P2 Instance : same as P1, with each NIS containing a single interval. Objective : same as P1 Theorem : P2 solvable in polynomial time
Bounded Number of Switching Input parameter d Allowable # of switching to sleep mode Problem P3 Instance : Same as P1, plus integer d Objective : Same as P1 with sw1+sw2 ≤ d Theorem : P3 solvable in (pseudo) polynomial time
Experimental Results
Target Power to Save In the past, switching power was dominant source of power consumption, and thus major concerns are laid on here Under 65nm feature size, leakage current becomes major contributor to power consumption We need to consider both dynamic and static power consumption to further save it
Leakage Power New low-power design challenge Cause is technology scaling Device speed Chip density Two principal static power components Sub-threshold leakage : weak inversion current across device Gate leakage : tunneling current through insulator
Leakage Power Operating frequency and voltage f ∝ (V − Vth)α / V Overall power consumption P = ACV 2f + VIleak Ileak = Isub + Iox
Power Breakdown Trends (ITRS 2002)
Leakage Power Sources
Considerations Obtaining to memory access pattern Dynamic access profile: statically running target application on a given microprocessor Random weighted vector generation Optimal partition based on this information As long as the profiled information sustain, good! What if feedback control system? Task periods variable Memory access pattern might be changed
Recent researches Partitioning itself Not much, and mainly focus on cache memory Segmentation Sub-banking Physical as well as logical partitioning Another report Minor contribution compared to this paper Geometric heuristic algorithm integrated with GA (genetic algorithm)
Conclusion Power consumption become one of the first-class design constraints in embedded system Dynamic power saving technique Partitioning problem to elongate sleep mode Power consumption contributor change Feedback control system’s problem
Thank You !