Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #30 Reversible Scaling Analysis II: With Leakage / Comm. Limits Wed.,

Similar presentations


Presentation on theme: "Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #30 Reversible Scaling Analysis II: With Leakage / Comm. Limits Wed.,"— Presentation transcript:

1 Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #30 Reversible Scaling Analysis II: With Leakage / Comm. Limits Wed., Mar. 27

2 Parallel “ Frank02” algorithm We can simply squish the triangles closer together to eliminate the wasted spacetime!We can simply squish the triangles closer together to eliminate the wasted spacetime! Resulting algorithm is linear time for all n and k and dominates Ben89 for time, spacetime, & energy!Resulting algorithm is linear time for all n and k and dominates Ben89 for time, spacetime, & energy! Real time Emulated time k=2 n=3 k=3 n=2 k=4 n=1

3 Setup for Analysis For energy-dominated limit,For energy-dominated limit, –let cost “$” equal energy. c $ = energy coefficient, r $ = r $(min) = leakage powerc $ = energy coefficient, r $ = r $(min) = leakage power $ i = energy dissipation per irreversible state-change$ i = energy dissipation per irreversible state-change Let the on/off ratio R on/off = r $(max) /r $(min) = P max /P min.Let the on/off ratio R on/off = r $(max) /r $(min) = P max /P min. Note that c $  $ i ·t min = $ i ·($ i / r $(max) ), so r $(max)  $ i 2 /c $Note that c $  $ i ·t min = $ i ·($ i / r $(max) ), so r $(max)  $ i 2 /c $ So R on/off  $ i 2 / c $ r $(min) = $ i 2 / c $ r $So R on/off  $ i 2 / c $ r $(min) = $ i 2 / c $ r $

4 Time Taken There are n levels of recursion.There are n levels of recursion. Each multiplies the width of the base of the triangle by k.Each multiplies the width of the base of the triangle by k. Lowest-level triangles take time c·t op.Lowest-level triangles take time c·t op. Total time is thus c·t op ·k n.Total time is thus c·t op ·k n. k=4 n=1 Width 4 sub-units

5 Number of Adiabatic Ops Each triangle contains k + (k  1) = 2k  1 immediate sub-triangles.Each triangle contains k + (k  1) = 2k  1 immediate sub-triangles. There are n levels of recursion.There are n levels of recursion. Thus number of adiabatic ops is c·(2k  1) nThus number of adiabatic ops is c·(2k  1) n k=3 n=2 5 2 = 25 little triangles (adiabatic operations)

6 Spacetime Usage Each triangle includes the spacetime usage of all k  1 of its subtriangles,Each triangle includes the spacetime usage of all k  1 of its subtriangles, Plus, additional spacetime units, each consisting of 1 storage unit, for time t op ·k n  1Plus, additional spacetime units, each consisting of 1 storage unit, for time t op ·k n  1 k=5 n=1 Time t op k n-1 1 state of irrev. mach. Being stored 1 2 3 1+2+3 units Resulting recurrence relation: ST(k,0) = 1 (or c) ST(k,n) = (2k  1)·ST(k,n  1) + (k 2  3k+2)·k n  1 /2

7 Reversible Cost Adiabatic cost plus spacetime cost: $ r = $ a + $ r = (2k-1) n ·c $ /t + ST(k,n)·r $ tAdiabatic cost plus spacetime cost: $ r = $ a + $ r = (2k-1) n ·c $ /t + ST(k,n)·r $ t Minimizing over t gives: $ r = 2[(2k-1) n · ST(k,n) ·c $ r $ ] 1/2Minimizing over t gives: $ r = 2[(2k-1) n · ST(k,n) ·c $ r $ ] 1/2 But, in energy-dominated limit, c $ r $  $ i 2 / R on/off,But, in energy-dominated limit, c $ r $  $ i 2 / R on/off, So: $ r = 2$ i ·[(2k-1) n · ST(k,n) / R on/off ] 1/2So: $ r = 2$ i ·[(2k-1) n · ST(k,n) / R on/off ] 1/2

8 Tot. Cost, Orig. Cost, Advantage Total cost $ i for irreversible operation performed at end of algorithm, plus reversible cost, gives: $ tot = $ i · {1 + 2[(2k-1) n · ST(k,n) / R on/off ] 1/2 }Total cost $ i for irreversible operation performed at end of algorithm, plus reversible cost, gives: $ tot = $ i · {1 + 2[(2k-1) n · ST(k,n) / R on/off ] 1/2 } Original irreversible machine performing k n ops would use cost $ orig = $ i ·k n, so,Original irreversible machine performing k n ops would use cost $ orig = $ i ·k n, so, Advantage ratio between reversible & irreversible cost,Advantage ratio between reversible & irreversible cost,

9 Optimization Algorithm For any given value on R on/off,For any given value on R on/off, Scan the possible values of n (up to some limit),Scan the possible values of n (up to some limit), For each of those, scan the possible values of k,For each of those, scan the possible values of k, Until the maximum R $(i/r) for that n is foundUntil the maximum R $(i/r) for that n is found –(the function only has a single local maximum) And return the max R $(i/r) over all n tried.And return the max R $(i/r) over all n tried.

10 Spacetime blowup Energy saved k n

11 Asymptotic Scaling The potential energy savings factor scales as R $(i/r)  R on/off ~0.4,The potential energy savings factor scales as R $(i/r)  R on/off ~0.4, while the spacetime overhead goes only as R $(i/r)  R $(i/r) ~0.45, or R on/off ~0.18.while the spacetime overhead goes only as R $(i/r)  R $(i/r) ~0.45, or R on/off ~0.18. E.g., with an R on/off of 10 9, you can do worst- case computation in an adiabatic circuit with:E.g., with an R on/off of 10 9, you can do worst- case computation in an adiabatic circuit with: –An energy savings of up to a factor of 1,200× ! –But, this point is 700,000× less hardware-efficient!

12 Various Cost Measures Entropy - advantage as per previous analysisEntropy - advantage as per previous analysis Area times time - scales w. entropy generatedArea times time - scales w. entropy generated Performance, given area constraint -Performance, given area constraint - –In leakage free-limit, advantage proportional to d 1/2 –With leakage, what’s the max advantage? (See hw) NOW:NOW: –Are there any performance/cost advantages from adiabatics even when there is no cost or constraint to entropy or to area? –YES, for flux-limited computations that require communications. Let’s see why…

13 Perf. scaling w. # of devices If alg. is not limited by communications needs,If alg. is not limited by communications needs, –Use irreversible processors spread in a 2-D layer. –Remove entropy along perpendicular dimension. –No entropy rate limits, so no speed advantage from reversibility.so no speed advantage from reversibility. If alg. requires only local communication, latency  cyc. time, in an N D ×N D ×N D mesh,If alg. requires only local communication, latency  cyc. time, in an N D ×N D ×N D mesh, –Leak-free reversible machine perf. scales better! –Irreversible t cyc =  (N D 1/3 ) –Reversible t cyc =  (N D 1/4 )…  (N D 1/12 ) × faster! To boost reversibility speedup by 10×, one must consider ~10 36 -CPU machines (1.7 trillion moles of CPUs!)To boost reversibility speedup by 10×, one must consider ~10 36 -CPU machines (1.7 trillion moles of CPUs!) –1.7 trillion moles of H atoms weighs 1.7 million metric tons! »A ~100-m tall hill of H-atom sized CPUs!

14 Lower bound on irrev. time Simulate N proc = N D 3 cells for N steps » N D steps.Simulate N proc = N D 3 cells for N steps » N D steps. Consider a sequence of N D update steps.Consider a sequence of N D update steps. Final cell value depends on N D 4 ops in time T.Final cell value depends on N D 4 ops in time T. All ops must occur within radius r = cT of cell.All ops must occur within radius r = cT of cell. Surface area A  T 2, rate R op  T 2 sustainable.Surface area A  T 2, rate R op  T 2 sustainable. N ops  R op T  T 3 needs to be at least N D 4.N ops  R op T  T 3 needs to be at least N D 4.  T must be  (N D 4/3 ) to do all N D steps.  T must be  (N D 4/3 ) to do all N D steps. Average time per step must be  (N D 1/3 ).Average time per step must be  (N D 1/3 ). Any irreversible machine (of any technology or architecture) must obey this bound!Any irreversible machine (of any technology or architecture) must obey this bound!

15 Irreversible 3-D Mesh

16 Reversible 3-D Mesh

17 Non-local Communication Best computational task for reversibility:Best computational task for reversibility: –Each processor must exchange messages with another that is N D 1/2 nodes away on each cycle Unsure what real-world problem demands this pattern!Unsure what real-world problem demands this pattern! –In this case, reversible speedup scales with number of CPUs to “only” the 1/18 th power. To boost reversibility speedup by 10×, “only” need 10 18 (or 1.7 micromoles) of CPUsTo boost reversibility speedup by 10×, “only” need 10 18 (or 1.7 micromoles) of CPUs If each was a 1-nm cluster of 100 C atoms, this is only 2 mg mass, volume 1 mm 3.If each was a 1-nm cluster of 100 C atoms, this is only 2 mg mass, volume 1 mm 3. Current VLSI: Need cost level of ~$25B before you see a speedup.Current VLSI: Need cost level of ~$25B before you see a speedup.

18 Ballistic Machines In the limit if c S  0, the asymptotic benefit for 3-D meshes goes as N D 1/3 or N proc 1/9.In the limit if c S  0, the asymptotic benefit for 3-D meshes goes as N D 1/3 or N proc 1/9. –Only need a billion devices to multiply reversible speedup by 10×. With 1 nm 3 devices, a cube 1  m on a side (bacteria size) would do it!With 1 nm 3 devices, a cube 1  m on a side (bacteria size) would do it! Does rod logic have low enough c S and small enough size to attain this prediction…?Does rod logic have low enough c S and small enough size to attain this prediction…? –(Need to check.)

19 Minimizing volume via folding Allows prev. solutions to be packed in min. volume.Allows prev. solutions to be packed in min. volume. Volume scales proportionally to mass.Volume scales proportionally to mass. No change in speed or entropy flux.No change in speed or entropy flux.

20 Cooling Technologies

21 Irreversible Max Perf. Per Area

22 Reversible Entropy Coeffs.

23 Rev. vs. Irrev. Comparisons

24 Sizes of Winning Rev. Machines

25 Some Analytical Challenges Combine Frank ‘02 emulation algorithm,Combine Frank ‘02 emulation algorithm, Analysis of its energy and space efficiency as a function of n and k,Analysis of its energy and space efficiency as a function of n and k, And plug it into the analysis for the 3-D meshes, to see…And plug it into the analysis for the 3-D meshes, to see… What are the optimal speedups for arbitrary mesh computations on rev. machines, as a function of:What are the optimal speedups for arbitrary mesh computations on rev. machines, as a function of: –R on/off, device volume, entropy flux limit, machine size. –And, does perf./hw improve, and if so, how much?


Download ppt "Physical Limits of Computing Dr. Mike Frank CIS 6930, Sec. #3753X Spring 2002 Lecture #30 Reversible Scaling Analysis II: With Leakage / Comm. Limits Wed.,"

Similar presentations


Ads by Google