Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off.

Similar presentations


Presentation on theme: "Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off."— Presentation transcript:

1 Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters

2 HPC PC Clusters with Ethernet Host/CPU –Various low-power techniques are used DVFS Power Gating Ethernet Switch –Always preparing (active) for packet injection We propose, and evaluate a low-power technique of Ethernet switches for PC clusters PCEthernet switch Interconnects share@TOP500 (Nov 2008 ) Gigabit Ethernet 56% GbE

3 Ethernet for HPC –Link aggregation (channel group) + multi-paths On/Off link activation method Evaluations –Overhead of On/Off link operation –Performance and power consumption of PC clusters Outline

4 Ethernet on HPC systems  Increasing the number of ports of GbE switches - 24/48-port switches provide the lowest cost per port  Improving the computation power of host ( > 10GFlops)  Link aggregation [IEEE 802.3ad] + multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004] - drastically increasing the number of links switch host TREE 1TREE 4TREE 3TREE 2 0123 456 7 8910 11 12 13 14 15 Link aggr. using 3 links 4 paths

5 Power cons is almost constant regardless of traffic load # of activated ports dominates the power cons of switches –Power cons of port is reduced down to ZERO by port- shutdown operation Power cons of GbE switches ProductPortOther (Xbar) Total ( ratio of ports ) PC53241.214.942.9(65%) PC62242.042.591.1(53%) PC62482.156.8155.2(63%) SF-4201.032.655.4(41%) C-37501.884.5127.7(34%) Unit :W

6 Overview of the on/off link method switch node Traffic load becomes low ( turning off a part of links) TREE 1TREE 4TREE 3TREE 2 0123 456 7 8910 11 12 13 14 15 TREE 1TREE 4TREE 3TREE 2 0123 456 7 8910 11 12 13 14 15 Network load is not always high (e.g. during computation time Switch ports consume 40-60% of the total power

7 Ethernet for HPC –Link aggregation (channel group) + multi-paths On/Off link activation method Evaluations –Overhead of On/Off link operation –Performance and power consumption of PC clusters Outline

8 A framework of on/off link method Eg : port monitor, IPTraf, pilot execution How is it implemented on Ethernet? Low or high-load links appear Selection of on/off links and paths Update of on/off link operation Traffic monitoring No Yes Very crucial factor Low traffic load is detected TREE 1TREE 4TREE 3TREE 2 0123 456 78910 11 12 13 14 15 Paths: Before & After the before path is deactivated

9 Requirements for the on/off link method To achieve a practical on/off link activation method,  No update of the MPI communication library  Using existing functions of commercial switches  Hiding the overhead to activate the link  Stabilizing the MAC address tables during updating paths - Avoiding broadcast storms, and communication interruption TREE 1TREE 4TREE 3TREE 2 0123 456 7 8910 11 12 13 14 15 Switch Host Before After

10 0 4 123 567 Changing the paths for on/off link op Using switch-tagged ・ VLAN routing method [Otsuka,ICPP06] –Specifying the path by attaching the VLAN tag to a frame ( Port VLAN ID: PVID) –Each host sends and receives usual (untagged) frames When an frame arrives at a switch from a host, add a VLAN tag (PVID) to it When it leaves to a host, removes the VLAN tag The path of PVID#v1 The path of PVID#v0 0 4 123 567 VLAN v0 VLAN v1 PVID v0 1 VLAN tag # v0 is attached

11 When a deactivated link is activated (1) Activating the target link –Using no-shutdown command of switch (2) Create VLAN v0 for the new path set that includes the target link, and make its MAC address table (3) Update the PVIDs of the ports for connecting hosts to v0 0 4 123 56 7 Updating PVID to v0 Before PVID v0 0 4 123 567 Step 3 0 4 123 567 Step 1,2 Link On, VLAN v0 When the traffic increases

12 When an activated link is deactivated (1) Create VLAN v1 for the new path set that avoids the target link, and make its MAC address table (2) Update the PVID of the ports for connecting hosts to v1 (3) Deactivating the link The path of PVID v0 PVID #v0 v1 Before 0 4 123 56 7 Step 3 0 4 123 567 Deactivating Decreasing the traffic 0 4 123 567 Step 1,2 The path of PVID v1

13 Ethernet for HPC –Link aggregation (channel group) + multi-paths On/Off link activation method Evaluations –Overhead of On/Off link operation On/off link operation Overhead to modify the path set –Performance and power consumption of PC clusters Outline Dell 5324, 6224(24 ports), 6248(48 ports), Netgear SF-G0420(24 ports) We can buy them at $1,000-3,000

14 a link is continuously operated: on off on When enabling STP, the overhead becomes some dozens ~ 1 min To hide this overhead, paths should be updated after completing the on/off operation Fund. eval : On/Off overhead On/Off Link Op. PC53244.0 (sec) PC62243.4 PC62482.2 SF-42012.0

15 Measure the overhead to change paths using VLANs Communication is not interrupted!! –Enabling the runtime on/off link activation Fund. eval(2) : overhead to update paths Path update PC53240(sec) PC62240 PC62480 SF-4200 Before After Update PVID to v1 VLAN v0 VLAN v1

16 Performance evaluation on a PC cluster PC Cluster –128 hosts, Dual Opteron 1.8GHz x2 –MPICH 1.2.7p1 GbE switch –Dell Power Connect6248 28host per switch 48port@8 Application –NPB 3.2

17 Topology of the cluster Peak: 4×2 torus, 6 links between switches –Enabling the link aggregation (IEEE 803.ad) Pre-executing the applications for estimating traffic amount –Set up the on/off link set before executing Two on/off link selection algorithms –Conservative: maintain the maximum amount of traffic on a link –Aggressive: further power reduction ( details are the proceeding ) Torus

18 Results of NPB(64 procs, PC6248 SW ) Fig 1 : Performance Fig 2 : Power Cons of NWs, PC6248s 26% of NW power cons is reduced w/o performance degradation 0.6 0.7 0.8 0.9 1 1.1 EPISLUSP Relative Power Cons(W) peak(all links)conservativeaggressive The conservative policy maintained almost the peak performance 26% of power reduction

19 Results of NPB(64 procs, other SWs ) A small number of services in L2 switch ( PC5324) is always running compared with that of L3 switch ( PC6248) 0.6 0.7 0.8 0.9 1 1.1 EPISLUSP Relative Power Cons(W) peak(all links)conservativeaggressive 0.6 0.7 0.8 0.9 1 1.1 EPISLUSP Relative Power Cons(W) peak(all links)conservativeaggressive Fig 3 : Power Cons, SF- 420s Fig 4 : Power Cons, PC5324 37% of power reduction The L2 switches reduces the larger ratio of power cons

20 On/Off interconnection networks –Cannot be directly applied to Ethernet –M.Alonso[IPDPS05],V.Soteriou[TPDS07] –Our on/off link method enables to support some of them in Ethernet DVFS for interconnection networks –L.Shang[HPCA03], J.M.Stine[CAL04] –Using multi-speed Ethernet (10M/100M/GbE/10GE) is similar to the approach for DVFS Dell switch:PC6248, 10M: 1.1W 100M: 1.3W GbE: 2.1W Related Work

21 We propose the on/off link method on Ethernet –Using port-shutdown command for reducing power cons Switch ports consume up to 60% of power cons in GbE switch –Stabilizing the update of the MAC address table Evaluations on the PC cluster with GbE switches –No overhead to update paths –Reducing down to up to 37% of NW power cons We will provide the total solution of Ethernet for Low-Power PC clusters Link aggre. + multi-path topology + on/off links Conclusions


Download ppt "Michihiro Koibuchi(NII, Japan ) Tomohiro Otsuka(Keio U, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) An On/Off."

Similar presentations


Ads by Google