Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios Michael P. Frank CISE Department / ECE Dept. Brown.

Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios Michael P. Frank CISE Department / ECE Dept. Brown Bag Seminar Tue., Mar. 26

Source: ITRS ‘99

Across Multiple Technologies Source: Kurzweil, The Age of Spiritual Machines, pp. 22-25 Mechanical Electromechanical Relays Vacuum Tubes Discrete Transistors Integrated Circuits

½CV 2 based on ITRS ‘99 figures for V dd and minimum transistor gate capacitance. T=300 K

Information & Entropy 1 2 3 Informational Spin label Status 1 Entropy 2 Known information 3 Entropy Example: System with 3 two- state subsystems, such as quantum spins. 2 3 =8 states Ruled out by some knowledge

Illustrating Landauer’s principle 0 0 1 1 … … N states 0 s0s0 0 … 0 0 … … … 2N states Unitary (1-1) evolution Before bit erasureAfter bit erasure s N-1 s0s0 s  0 s  N-1 s  N s  2N-1 State of bit to be “erased.” State of rest of system (thermal modes, &c.)

Conventional Gates are Irreversible Logic gate behavior (on receiving new input):Logic gate behavior (on receiving new input): –Many-to-one transformation of local state! –Required to dissipate bT by Landauer principle –Incurs ½CV 2 dissipation in 2 out of 4 cases. inout Example: Static CMOS Inverter: Transformation of local state:

Exact formula (if R const.): for frequency factor f :  RC/t Adiabatic Charging in CMOS

Adiabaticity is Fundamental Adiabatic (dissipation  quickness) processes can occur in any type of system.Adiabatic (dissipation  quickness) processes can occur in any type of system. –Cf. Adiabatic theorem of quantum mechanics. Specific adiabatic logics have been described for many proposed future device technologies:Specific adiabatic logics have been described for many proposed future device technologies: –Superconducting (Likharev ‘82, Averin et al. ‘01) –Nanomechanical (Drexler ‘92, & Merkle mid-’90s) –Quantum-dot (Lent & Tougaw, mid-’90s-present) –Quantum computing implementations (inherently) Claim: Work on architectures & analysis for adiabatic CMOS will still apply post-CMOS!Claim: Work on architectures & analysis for adiabatic CMOS will still apply post-CMOS!

Adiabatic Rules for Transistors Rule 1: Never turn on a transistor if it has a nonzero voltage across it!Rule 1: Never turn on a transistor if it has a nonzero voltage across it! –I.e., between its source & drain terminals. –Why: This erases info. & causes ½CV 2 disspation. Rule 2: Never apply a nonzero voltage across a transistor even during any on  off transition!Rule 2: Never apply a nonzero voltage across a transistor even during any on  off transition! –Why: When partially turned on, the transistor has relatively low R, gets rel. high P=V 2 /R dissipation. –Corollary: Never turn off a transistor when it has a nonzero current going through it! Why: As R gradually increases, the V=IR voltage drop will build, and then rule 2 will be violated.Why: As R gradually increases, the V=IR voltage drop will build, and then rule 2 will be violated.

Adiabatic Rules continued Transistor Rule 3: Never apply a large voltage across any on transistor.Transistor Rule 3: Never apply a large voltage across any on transistor. –Why: So transition will be more reversible; dissipation will approach CV 2 (RC/t), not ½CV 2. Adiabatic rules for other components: Diodes: Don’t use them at all!Diodes: Don’t use them at all! –There is always a built-in voltage drop across them! Resistors: Avoid moderate network resistances.Resistors: Avoid moderate network resistances. –e.g. stay away from range >10 k  and 10 k  and <1 M  Capacitors: Minimize, reliability permitting.Capacitors: Minimize, reliability permitting. –Note: Adiabatic dissipation scales with C 2 !

Transistor Rules Summarized off high on highlow off high off low on high on low Legal transitions in green. (For n- or p-FETs.) Dissipative states and transitions in red. off highlow on highlow

Transformation of local state: 

Simple Reversible CMOS Latch Uses a standard CMOS transmission gateUses a standard CMOS transmission gate Sequence of operation:Sequence of operation: (1) input initially matches latch contents (output), (2) input changes  output changes, (3) latch closes, (4) input removed. P P inout BeforeInputInput input:arrived:removed: inoutinoutinout aaaaaa bbab a b a b

Generic Frictional Coefficients Normal defs. of friction (coeff. of sliding friction, viscosity, etc.) may not apply to all processes. For a given mechanism executing a specified process (i.e., following a specified desired trajectory or -ies) adiabatically over a time t: Energy coefficient: c E =  E lost ·t =  E lost /qEnergy coefficient: c E =  E lost ·t =  E lost /q –Energy dissipated from traj. per unit of “quickness” Note quickness q = 1/t has units like HzNote quickness q = 1/t has units like Hz Entropy coefficient: c S =  S made ·t =  S made /qEntropy coefficient: c S =  S made ·t =  S made /q –New entropy generated per unit of quickness Note that c E = c S ·T at temperature T.Note that c E = c S ·T at temperature T. What matters!

Energy Coefficient in Electronics For charging capacitive load C by voltage V through effective resistance R: c E =  E lost t = (CV 2 RC/t)t = C 2 V 2 RFor charging capacitive load C by voltage V through effective resistance R: c E =  E lost t = (CV 2 RC/t)t = C 2 V 2 R If the resistances are voltage-controlled switches with gain factor k controlled by the same voltage V, then effective R  1/kV c E = C 2 V/kIf the resistances are voltage-controlled switches with gain factor k controlled by the same voltage V, then effective R  1/kV c E = C 2 V/k In constant-field-scaled CMOS, k  1/h ox  , C  , and V  , so c E   3 /  =  4 ;  E lost = c E /t   4 /  =  3 (like CV 2 energy)In constant-field-scaled CMOS, k  1/h ox  , C  , and V  , so c E   3 /  =  4 ;  E lost = c E /t   4 /  =  3 (like CV 2 energy)

Entropy coefficients of some reversible logic gate operations From Frank ‘98, “Ultimate theoretical models of nanocomputers” (Nanotechnology journal): SCRL, circa 1997:~1b/HzSCRL, circa 1997:~1b/Hz Optimistic reversible CMOS:~10b/kHzOptimistic reversible CMOS:~10b/kHz Merkle’s “quantum FET:”~1.2 b/GHzMerkle’s “quantum FET:”~1.2 b/GHz Nanomechanical rod logic:~.07b/GHzNanomechanical rod logic:~.07b/GHz Superconducting PQ gate:~25b/THzSuperconducting PQ gate:~25b/THz Helical logic:~.01b/THzHelical logic:~.01b/THz How low can you go? We don’t really know!

Quantifying Leakage For a given structured system:For a given structured system: Leakage power: P leak = dE leak / dtLeakage power: P leak = dE leak / dt Spontaneous entropy generation rate: S leak = dS leak / dtSpontaneous entropy generation rate: S leak = dS leak / dt Again, note P leak = S leak · T at temperature T.Again, note P leak = S leak · T at temperature T.

Minimum Losses w. Leakage E leak = P leak ·t r E adia = c E / t r E tot = E adia + E leak

Min. energy & R off /R on ratio Note that: c E = C 2 V 2 R on and if dominant leakage is source/drain: P leak = V 2 /R offNote that: c E = C 2 V 2 R on and if dominant leakage is source/drain: P leak = V 2 /R off So: c E P leak = C 2 V 4 /(R off /R on ) E min = 2(c E P leak ) 1/2 = 2CV 2 (R off /R on )  1/2So: c E P leak = C 2 V 4 /(R off /R on ) E min = 2(c E P leak ) 1/2 = 2CV 2 (R off /R on )  1/2 So: Q max = ½CV 2 / (2CV 2 (R off /R on )  1/2 ) = ¼(R off /R on ) 1/2 = ¼(I on /I off ) 1/2So: Q max = ½CV 2 / (2CV 2 (R off /R on )  1/2 ) = ¼(R off /R on ) 1/2 = ¼(I on /I off ) 1/2

Clock/Power Supply Desiderata Requirements for an adiabatic timing signal / power supply:Requirements for an adiabatic timing signal / power supply: –Generate trapezoidal waveform with very flat high/low regions Flatness limits Q of logic.Flatness limits Q of logic. Waveform during transitions is ideally linear,Waveform during transitions is ideally linear, –But this does not affect maximum Q, only energy coefficient. –Operate resonantly with logic, with high Q. Power supply Q will limit overall system QPower supply Q will limit overall system Q –Reasonable cost, compared to logic it powers. –If possible, scale Q  t (cycle time) Required to be considered an adiabatic mechanism.Required to be considered an adiabatic mechanism. May conflict w. inductor scaling laws!May conflict w. inductor scaling laws! At the least, Q should be high at leakage-limited speedAt the least, Q should be high at leakage-limited speed (Ideally, independent of t.)

Supply concepts in my research Superpose several sinusoidal signals from phase-synchronized oscillators at harmonics of fundamental frequencySuperpose several sinusoidal signals from phase-synchronized oscillators at harmonics of fundamental frequency –Weight these frequency components as per Fourier transform of desired waveform Create relatively high-L integrated inductors via vertical, helical metal coilsCreate relatively high-L integrated inductors via vertical, helical metal coils –Only thin oxide layers between turns Use mechanically oscillating, capacitive MEMS structures in vacuo as high-Q (~10k) oscillatorUse mechanically oscillating, capacitive MEMS structures in vacuo as high-Q (~10k) oscillator –Use geometry to get desired wave shape directly

A MEMS Supply Concept Energy stored mechanically.Energy stored mechanically. Variable coupling strength -> custom wave shape.Variable coupling strength -> custom wave shape. Can reduce losses through balancing, filtering.Can reduce losses through balancing, filtering. Issue: How to adjust frequency?Issue: How to adjust frequency?

Summary of Limiting Factors When considering adiabaticizing a system: What fraction of system power is in logic? f LWhat fraction of system power is in logic? f L –Vs. Displays, transmitters, propulsion. What fraction of logic is done adiabatically? f aWhat fraction of logic is done adiabatically? f a –Can be all, but w. cost-efficiency overheads. How large is the I on /I off ratio of switches?How large is the I on /I off ratio of switches? –Affects leakage & minimum adiabatic energy. What is the Q sup of the resonant power supply?What is the Q sup of the resonant power supply? What is the relative cost of power & logic? r $What is the relative cost of power & logic? r $ –E.g. decreasing power cost by r $ by increasing HW cost by  r $ will not help. “Power premium”

Minimizing cost/performance $ P = Cost of power in original system$ P = Cost of power in original system $ H = Cost of logic HW in original system$ H = Cost of logic HW in original system $ P = r $ $ H ; $ H = $ P /r $$ P = r $ $ H ; $ H = $ P /r $ For cost-efficiency inverse to energy savings:For cost-efficiency inverse to energy savings: $ tot,min = $ P r $ -1/2 + $ H r $ 1/2 = 2 $ P r $ -1/2$ tot,min = $ P r $ -1/2 + $ H r $ 1/2 = 2 $ P r $ -1/2 $ tot,orig = $ P + $ H = (1+r $ )$ H = ((1+r $ )/r $ ) $ P$ tot,orig = $ P + $ H = (1+r $ )$ H = ((1+r $ )/r $ ) $ P $ tot,orig /$ tot,min = ½(1+r $ )r $ -1/2  ½r $ 1/2 for large r $$ tot,orig /$ tot,min = ½(1+r $ )r $ -1/2  ½r $ 1/2 for large r $

Summary of adiabatic limits Cost-effective adiabatic energy savings factor:Cost-effective adiabatic energy savings factor: S a = E conv / E adia in cost-effective adiabatic system Some rough upper bounds on S a : S a  ~ 1/(1  f L ) S a  ~ 1/(1  f a ) S a  ~ ¼(I on /I off ) 1/2 S a  Q sup S a  ~ r $ 1/2Some rough upper bounds on S a : S a  ~ 1/(1  f L ) S a  ~ 1/(1  f a ) S a  ~ ¼(I on /I off ) 1/2 S a  Q sup S a  ~ r $ 1/2 Discussion ignores benefits from adiabatics of denser packing & smaller communications delays in parallel algorithms.Discussion ignores benefits from adiabatics of denser packing & smaller communications delays in parallel algorithms. (worse than these for non-ideal computations)

Motivation for this study We want to know how to carry out any arbitrary computation in a way that is reversible to an arbitrarily high degree.We want to know how to carry out any arbitrary computation in a way that is reversible to an arbitrarily high degree. –Up to limits set by leakage, power supply, etc. We want to do this as efficiently as possible:We want to do this as efficiently as possible: –Using as few “device ticks” as possible (spacetime) Minimizes HW cost, & leakage lossesMinimizes HW cost, & leakage losses –Using as few adiabatic transitions as possible (ops) Minimizes frictional lossesMinimizes frictional losses But, a desired computation may be originally spec’d in terms of irreversible primitives.But, a desired computation may be originally spec’d in terms of irreversible primitives.

General-Case vs. Special-Case We’d like to know two kinds of things:We’d like to know two kinds of things: –For arbitrary general-purpose computations, How to automatically emulate them in a fairly efficient reversible way,How to automatically emulate them in a fairly efficient reversible way, –w/o needing new intelligent/creative design work in each case? –For various specific computations of interest, What are the most efficient reversible algorithms?What are the most efficient reversible algorithms? –Or at least, the most efficient that we can find? Note: These may not look anything like the most efficient irreversible algorithms!Note: These may not look anything like the most efficient irreversible algorithms!

The Landauer embedding The obvious embedding of irreversible ops into “expanding” reversible ones leads to a linear increase in space through time. (Landauer ‘61)The obvious embedding of irreversible ops into “expanding” reversible ones leads to a linear increase in space through time. (Landauer ‘61) –Or, increase in width of an input-consuming circuit “Garbage” bits input Circuit depth, or time  “Expanding” operations (e.g., AND) Desired output

Lecerf Reversal Lecerf (‘63) was interested in the group-theory question of whether an iterated permutation of items would eventually return to initial item.Lecerf (‘63) was interested in the group-theory question of whether an iterated permutation of items would eventually return to initial item. –Proved undecidable by reducing Turing’s halting problem to this question, w. a reversible TM. Reversible TM reverses direction instead of halting.Reversible TM reverses direction instead of halting. Returns to initial state iff irreversible TM would halt.Returns to initial state iff irreversible TM would halt. Only problem: No useful output data!Only problem: No useful output data! f f  1 Desired output Garbage Input Copy of Input

The Bennett Trick Bennett (‘73) pointed out that you could simply fan-out (reversibly copy) the desired output before reversing.Bennett (‘73) pointed out that you could simply fan-out (reversibly copy) the desired output before reversing. Note O(T) storage is still temporarily needed!Note O(T) storage is still temporarily needed! f f  1 Garbage Input Copy of Input Desired output

Triangle Representation Represents any use of Bennett ‘73 embeddingRepresents any use of Bennett ‘73 embedding State of irrev. comp. @ time t i State of irrev. comp. @ time t i +  t i titi Time in irreversible system Time in reversible system Forward phase Reverse phase Adiabatic Process Mass on any vertical line = space usage @ that time

Improving Spacetime Efficiency Bennett ‘73 transforms a computation taking spacetime S·T to one taking  (S·T 2 ) in the worst case.Bennett ‘73 transforms a computation taking spacetime S·T to one taking  (S·T 2 ) in the worst case. –Can we do better? Bennett ‘89: Described a technique that takes spacetimeBennett ‘89: Described a technique that takes spacetime –Actually, can generalize slightly and arrange for exponent on T to be 1+ , where  0 (very slowly) Lange, McKenzie, Tapp ‘97: Space  (S) is possible, if you use time  (exp(  (S)))Lange, McKenzie, Tapp ‘97: Space  (S) is possible, if you use time  (exp(  (S))) –Not any more spacetime-efficient than Bennett.

“Pebble Game” Representation

Triangle representation k = 2 n = 3 k = 3 n = 2

Analysis of Bennett Algorithm n = # of recursive levels of algorithmn = # of recursive levels of algorithm k = # of lower-level iterations to go forward 1 higher-level stepk = # of lower-level iterations to go forward 1 higher-level step T r = # of reversible lowest-level steps executed = c(2k  1) n (c a small constant, e.g. 2)T r = # of reversible lowest-level steps executed = c(2k  1) n (c a small constant, e.g. 2) T i = # of irreversible steps emulated = k nT i = # of irreversible steps emulated = k n So, n = log k T i, and so T r = c(2k  1) log Ti/log k = ce log(2k  1)log(Ti)/log k = cT i log(2k  1)/log kSo, n = log k T i, and so T r = c(2k  1) log Ti/log k = ce log(2k  1)log(Ti)/log k = cT i log(2k  1)/log k (n+1 spikes) E.g. k=2: T r = 2T i log(3)/log(2)

Cost-Efficiency Analysis Total cost of doing a computation includes:Total cost of doing a computation includes: –Spacetime costs (storage used, integrated over time) Includes time-amortized manufacturing costIncludes time-amortized manufacturing cost Includes cost of total energy leakageIncludes cost of total energy leakage –leakage from any in-use storage element –Irreversibility costs (energy loss from irrev. ops) Total number of irreversible bit-erasures, CV 2 > kT each.Total number of irreversible bit-erasures, CV 2 > kT each. –Adiabatic costs (energy loss from reversible ops.) Proportional to number n a of adiabatic ops performed, times c e, divided by time t op of a single op.Proportional to number n a of adiabatic ops performed, times c e, divided by time t op of a single op.

Bennett 89 alg. is not optimal k = 2 n = 3 k = 3 n = 2 Just look at all the spacetime it wastes!!!

Parallel “ Frank02” algorithm We can simply squish the triangles closer together to eliminate the wasted spacetime!We can simply squish the triangles closer together to eliminate the wasted spacetime! Resulting algorithm is linear time for all n and k and dominates Ben89 for time, spacetime, & energy!Resulting algorithm is linear time for all n and k and dominates Ben89 for time, spacetime, & energy! Real time Emulated time k=2 n=3 k=3 n=2 k=4 n=1

Setup for Analysis For energy-dominated limit,For energy-dominated limit, –let cost “$” equal energy. c $ = energy coefficient, r $ = r $(min) = leakage powerc $ = energy coefficient, r $ = r $(min) = leakage power $ i = energy dissipation per irreversible state-change$ i = energy dissipation per irreversible state-change Let the on/off ratio R on/off = r $(max) /r $(min) = P max /P min.Let the on/off ratio R on/off = r $(max) /r $(min) = P max /P min. Note that c $  $ i ·t min = $ i ·($ i / r $(max) ), so r $(max)  $ i 2 /c $Note that c $  $ i ·t min = $ i ·($ i / r $(max) ), so r $(max)  $ i 2 /c $ So R on/off  $ i 2 / c $ r $(min) = $ i 2 / c $ r $So R on/off  $ i 2 / c $ r $(min) = $ i 2 / c $ r $

Time Taken There are n levels of recursion.There are n levels of recursion. Each multiplies the width of the base of the triangle by k.Each multiplies the width of the base of the triangle by k. Lowest-level triangles take time c·t op.Lowest-level triangles take time c·t op. Total time is thus c·t op ·k n.Total time is thus c·t op ·k n. k=4 n=1 Width 4 sub-units

Number of Adiabatic Ops Each triangle contains k + (k  1) = 2k  1 immediate sub-triangles.Each triangle contains k + (k  1) = 2k  1 immediate sub-triangles. There are n levels of recursion.There are n levels of recursion. Thus number of adiabatic ops is c·(2k  1) nThus number of adiabatic ops is c·(2k  1) n k=3 n=2 5 2 = 25 little triangles (adiabatic operations)

Spacetime Usage Each triangle includes the spacetime usage of all k  1 of its subtriangles,Each triangle includes the spacetime usage of all k  1 of its subtriangles, Plus, additional spacetime units, each consisting of 1 storage unit, for time t op ·k n  1Plus, additional spacetime units, each consisting of 1 storage unit, for time t op ·k n  1 k=5 n=1 Time t op k n-1 1 state of irrev. mach. Being stored 1 2 3 1+2+3 units Resulting recurrence relation: ST(k,0) = 1 (or c) ST(k,n) = (2k  1)·ST(k,n  1) + (k 2  3k+2)·k n  1 /2

Reversible Cost Adiabatic cost plus spacetime cost: $ r = $ a + $ r = (2k-1) n ·c $ /t + ST(k,n)·r $ tAdiabatic cost plus spacetime cost: $ r = $ a + $ r = (2k-1) n ·c $ /t + ST(k,n)·r $ t Minimizing over t gives: $ r = 2[(2k-1) n · ST(k,n) ·c $ r $ ] 1/2Minimizing over t gives: $ r = 2[(2k-1) n · ST(k,n) ·c $ r $ ] 1/2 But, in energy-dominated limit, c $ r $  $ i 2 / R on/off,But, in energy-dominated limit, c $ r $  $ i 2 / R on/off, So: $ r = 2$ i ·[(2k-1) n · ST(k,n) / R on/off ] 1/2So: $ r = 2$ i ·[(2k-1) n · ST(k,n) / R on/off ] 1/2

Tot. Cost, Orig. Cost, Advantage Total cost $ i for irreversible operation performed at end of algorithm, plus reversible cost, gives: $ tot = $ i · {1 + 2[(2k-1) n · ST(k,n) / R on/off ] 1/2 }Total cost $ i for irreversible operation performed at end of algorithm, plus reversible cost, gives: $ tot = $ i · {1 + 2[(2k-1) n · ST(k,n) / R on/off ] 1/2 } Original irreversible machine performing k n ops would use cost $ orig = $ i ·k n, so,Original irreversible machine performing k n ops would use cost $ orig = $ i ·k n, so, Advantage ratio between reversible & irreversible cost,Advantage ratio between reversible & irreversible cost,

Optimization Algorithm For any given value on R on/off,For any given value on R on/off, Scan the possible values of n (up to some limit),Scan the possible values of n (up to some limit), For each of those, scan the possible values of k,For each of those, scan the possible values of k, Until the maximum R $(i/r) for that n is foundUntil the maximum R $(i/r) for that n is found –(the function only has a single local maximum) And return the max R $(i/r) over all n tried.And return the max R $(i/r) over all n tried.

Spacetime blowup Energy saved k n

Asymptotic Scaling The potential energy savings factor scales as R $(i/r)  R on/off ~0.4,The potential energy savings factor scales as R $(i/r)  R on/off ~0.4, while the spacetime overhead goes only as R $(i/r)  R $(i/r) ~0.45, or R on/off ~0.18.while the spacetime overhead goes only as R $(i/r)  R $(i/r) ~0.45, or R on/off ~0.18. E.g., with an R on/off of 10 9, you can do worst- case computation in an adiabatic circuit with:E.g., with an R on/off of 10 9, you can do worst- case computation in an adiabatic circuit with: –An energy savings of up to a factor of 1,200× ! –But, this point is 700,000× less hardware-efficient!

Conclusions A new, more spacetime-efficient & energy- efficient algorithm for doing arbitrary computations adiabatically has been described.A new, more spacetime-efficient & energy- efficient algorithm for doing arbitrary computations adiabatically has been described. The energy savings in worst-case computations goes as the ~0.4 th power of device on/off ratio.The energy savings in worst-case computations goes as the ~0.4 th power of device on/off ratio. –Best case computations: 0.5 th power. However, the reduction in spacetime efficiency scales with energy savings to the ~1.6 th power.However, the reduction in spacetime efficiency scales with energy savings to the ~1.6 th power. –Still much faster than we would like! Adiabatics can be generally cost-effective, but still only for heavily energy-dominated apps.Adiabatics can be generally cost-effective, but still only for heavily energy-dominated apps.

Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios Michael P. Frank CISE Department / ECE Dept. Brown.

Similar presentations

Presentation on theme: "Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios Michael P. Frank CISE Department / ECE Dept. Brown."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios Michael P. Frank CISE Department / ECE Dept. Brown.

Similar presentations

Presentation on theme: "Power/Performance/Cost Efficiency of Adiabatic Circuits, as a function of Device On/Off Power Ratios Michael P. Frank CISE Department / ECE Dept. Brown."— Presentation transcript:

Similar presentations

About project

Feedback