Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu, Hiroki Matsutani, and Hideharu Amano

Similar presentations


Presentation on theme: "1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu, Hiroki Matsutani, and Hideharu Amano"— Presentation transcript:

1 1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu, Hiroki Matsutani, and Hideharu Amano koibuchi@nii.ac.jp Performance Evaluation of Power-aware Multi-tree Ethernet for HPC Interconnects

2 HPC PC Clusters with Ethernet Host/CPU –Various low-power techniques are used DVFS Power Gating Ethernet Switch –Always preparing (active) for packet injection We evaluate our power-aware On/Off Link Activation for Ethernet on PC clusters PCEthernet switch Interconnects share@TOP500 (Nov 2011 ) Gigabit Ethernet 45% GbE

3 Ethernet for HPC –Link aggregation (channel group) + multi-paths Our On/Off link activation method Evaluations –Performance and power consumption of PC clusters Outline

4 Ethernet on HPC systems  Increasing the number of ports of GbE switches - 24/48-port switches provide the lowest cost per port  Improving the computation power of host ( > 10GFlops)  Link aggregation [IEEE 802.3ad] + multi-path topology [Kudoh, IEEE Cluster, 2004][Viking, Infocom2004][Koibuchi et al, IEEE TPDS2011] - drastically increasing the number of links switch host Link aggr. using 2 links 0 4 123 56 7 2 paths

5 Power cons is almost constant regardless of traffic load # of activated ports dominates the power cons of switches –Power cons of port is reduced down to ZERO by port- shutdown operation Power cons of GbE switches ProductPortOther (Xbar) Total ( ratio of ports ) PC53241.214.942.9(65%) PC62242.042.591.1(53%) PC62482.156.8155.2(63%) SF-4201.032.655.4(41%) C-37501.884.5127.7(34%) Unit :W

6 Overview of the on/off link method Traffic load becomes low ( turning off a part of links) Network load is not always high (e.g. during computation time Switch ports consume 40-60% of the total power switch host 0 4 123 56 7 0 4 123 56 7

7 Ethernet for HPC –Link aggregation (channel group) + multi-paths Our On/Off link activation method Evaluations –Performance and power consumption of PC clusters Outline

8 A framework of on/off link method Eg : port monitor, IPTraf, pilot execution How is it implemented on Ethernet? Low or high-load links appear Selection of on/off links and paths Update of on/off link operation Traffic monitoring No Yes Traffic load becomes low Paths: Before & After The before path is deactivated 0 4 123 56 7

9 Requirements for the on/off link method  No update of the MPI communication library  Hide the overhead to activate the link  Stabilize the MAC address tables during updating paths Switch Host Before After 0 4 123 56 7

10 0 4 123 567 Changing the paths for on/off link op Using switch-tagged ・ VLAN routing method [Otsuka,ICPP06] –Specifying the path by attaching the VLAN tag to a frame ( Port VLAN ID: PVID) –Each host sends and receives usual (untagged) frames When an frame arrives at a switch from a host, add a VLAN tag (PVID) to it When it leaves to a host, removes the VLAN tag The path of PVID#v1 The path of PVID#v0 0 4 123 567 VLAN v0 VLAN v1 PVID v0 v1 VLAN tag # v0 is attached

11 When a deactivated link is activated (1) Activating the target link –Using no-shutdown command of switch (2) Create VLAN v0 for the new path set that includes the target link, and make its MAC address table (3) Update the PVIDs of the ports for connecting hosts to v0 0 4 123 56 7 Updating PVID to v0 Before PVID v0 0 4 123 567 Step 3 0 4 123 567 Step 1,2 Activate links VLAN v0 When the traffic increases

12 When an activated link is deactivated (1) Create VLAN v1 for the new path set that avoids the target link, and make its MAC address table (2) Update the PVID of the ports for connecting hosts to v1 (3) Deactivating the link The path of PVID v0 PVID #v0 v1 Before 0 4 123 56 7 Step 3 0 4 123 567 Deactivating Decreasing the traffic 0 4 123 567 Step 1,2 The path of PVID v1

13 Ethernet for HPC –Link aggregation (channel group) + multi-paths On/Off link activation method Evaluations –Performance and power consumption of PC clusters Outline

14 Performance evaluation on a PC cluster PC Cluster –66 hosts, 528 cores –CPU Quad-Core AMD Opteron 2.3GHz –Memory DDR2 667 MHz 8GB –NIC & driver Broadcom BCM95721, Tigon3 –Kernel 2.6.9-67.0.15.ELsmp GbE switch –Dell PC 6248 48port@8 Application –NPB 3.2 / HPL (OpenMPI 1.3 /MPICH-1.2.7p1) Dell PC6248SW

15 Topology of the cluster Tree or completely connected graph, –Up to 5 links between switches Enabling the link aggregation (IEEE 803.ad) Pre-executing the applications for estimating traffic amount –Set up the on/off link set before executing Performing our simple link regularation algorithm Completely (fully) Connected Topology Tree

16 Pre-evaluation (even link removal) Performance (Tflops) Rmax/Rpeak =61% (2) Linpack (HPL) (3) NPB, Class C All the applications drastically decrease the performance if links are uniformly removed (1) Synthetic traffic

17 Performance and Power in HPL Rmax/Rpeak =61% Over 20% power reduction with almost same performance Almost same performance

18 Performance and Power in NPB64 Rmax/Rpeak =61% Over 25% power reduction with almost same performance CLASS C IS, LU, BT, SP keep performance

19 Performance and Power in NPB128 Rmax/Rpeak =61% CLASS C Over 20% power reduction with almost same performance LU, MG keep performance

20 We evaluated our on/off link method on Ethernet –Multi-tree topologies & link aggre. are enabled –Using port-shutdown command for reducing power cons Ports consume up to 60% of switch power Reducing by up to 37% NW power in the 528-core PC cluster Conclusions

21 Behaviour of NW reconfiguration (topology update) Dell PC5324

22 NW Reconfiguration (Simulation) (b) Dynamic NW recofiguration ) Traffic becomes low, link state: ON Stabilizing switches, scalability, applicable to various ULP-HPC Application with dynamic NW reconfiguration Latency decreases Traffic load decreasesTraffic load increases (a) Static NW reconfiguration Traffic becomes high, link state: OFF

23 23 Limitation of Ethernet: tree topology When building a large-scale PC cluster with many switches, the performance is drastically decreased SwitchForce 10 E1200 ×1 Dell PowerConnect 6248 ×8 Port33648 Rpeak(TFlops) ( Rpeak/Rmax) 1.169 (63.4%)0.558 (34.5%) Cost($)400,000 16,000 ( 2,000 x 8) HPL performance in the 225-host PC cluster Congestion


Download ppt "1 Michihiro Koibuchi, Takafumi Watanabe, Atsushi Minamihata, Masahiro Nakao, Tomoyuki Hiroyasu, Hiroki Matsutani, and Hideharu Amano"

Similar presentations


Ads by Google