Download presentation
Presentation is loading. Please wait.
Published byElvin Potter Modified over 8 years ago
1
PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham Doan Tinh, Bui Viet Khoi, Hugo Thienpont, Jan Van Campenhout ISEE, HoChiMinh City, 24 October 2007
2
2 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Abstract The interconnection network inside a multi- processor system has a very irregular load. Can we adapt this network to its (time-varying) demands? Yes, using (optical) reconfiguration technology. We show the resulting network speedup, obtained through system-level simulations.
3
3 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline
4
4 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City supercomputer on-chip server Multiprocessor Interconnects Multiprocessing is everywhere: –supercomputers –servers –on-chip (multi-core) Processors need to communicate to solve a single problem Interconnection network becomes main system component Our focus: distributed shared- memory (DSM) servers
5
5 A DSM machine is made of: Nodes, each composed of: The processing unit Some levels of cache memory The local memory A network interface INTERCONNECTION NETWORK An interconnection network Architecture of a Distributed Shared-Memory system
6
6 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Distributed Shared-Memory hierarchy Network is part of the memory hierarchy Remote memory access requires network communication Network latency is very influential on performance CPU MEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPU MEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF cache instruction: 0.5 ns cache: 5 ns DDR: 50 ns network: 500 ns
7
7 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline
8
8 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Non-uniform network traffic in space and time => Reconfigurable network? CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF CPUMEM NetIF time load Link #9 time load Link #13 Variable communication patterns time load Link #5
9
9 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM Base network (fixed) Extra links/elinks (reconfigurable) Proposed topology reconfiguration
10
10 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Requirements: Reconfiguration intervals Selection and switching times Reconfiguration interval Traffic pattern locality << Reconfiguration is initiated at reconfiguration points placed on fixed time intervals Topology is optimized for traffic in previous interval
11
11 Optical Advantages Low-loss transmission Capable to provide large bandwidths Almost no crosstalk between channels High area density Data transparent reconfigurability Electrical Problems (at high frequency) Cross-talk Signal Distortion High Power Consumption High Latency (RC Delay) Bhanu Jaiswal University at Buffalo Optical interconnects
12
12 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City CPU 1 CPU 2... CPU n Broadcast element Fiber links Processor nodes Tunable lasers CPU 1 CPU 2... CPU n Photodetectors Optical reconfiguration implementation Based on wavelenght- division multiplexing (WDM) Components: –tunable laser (VCSEL) per node –broadcast element –wavelength-selective receiver per node For each source node, elink destination is selected by tuning the laser to the proper wavelength
13
13 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Full broadcast not realistic: –Too much power is wasted –Limited number of available wavelenghts (trade-off with cost, tuning speed) Selective broadcast: each node can reach a subset of other nodes –not all ‘extra links’ possible –some high-traffic paths can have intermediate nodes –only 1 extra link per node CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM 3 possible destinations for top node, 1 is selected by tuning the node’s transmitter Selective Optical Broadcasting
14
14 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Using diffractive optics, light from each node is broadcasted to 9 spots Node placement on the prism determines possible elink destinations Selective Optical Broadcasting
15
15 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline
16
16 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City - Complete NUMA memory system (cache system, directory protocol and allocator). - Detailed Network implementation with different topologies and extra links. - Real time Reconfiguration with prediction models and physical limitations. Virtutech SIMICS full-system simulator 16 processors at 1GHz. 2 levels cache system and 0.5 GBs main memory. 2 ns, 19 ns and 100+ ns access time to caches and main memory (local and remote). 4x4 Torus interconnection network. Solaris 9.0 operating system. SunFire TM 6800 server Benchmark applications SPLASH-2 Scientific parallel algorithms Simulation environment
17
17 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Average benchmark performance Measure average remote memory access latency Calculate improvement over non-reconfigurable case Averaged over all benchmark applications Increasing # elinks: performance increases Larger network: more gain (longer hop distance) Saturation occurs at # elinks = # nodes
18
18 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Average benchmark performance (II) Maximum number of extra links terminating at one node (fan-out) Performance increase from f = 1 to f = 2, saturation afterwards Further results will be with –f = 2, #elinks = #nodes –prism implementation (f = 1, #elinks = #nodes, 9 destinations limitation) Network size (# nodes)
19
19 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Network latency improvement Changing reconfiguration interval length Simulations for # elinks = # nodes, f = 2 Different benchmark: different benefit! Remember: tuning speed << interval << traffic locality 16 nodes32 nodes64 nodes
20
20 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Network latency improvement (II) selective broadcast: f = 1, only 9 destinations full broadcast: f = 2, no limitations
21
21 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Introduction to multiprocessor systems (DSM) and interconnection networks Reconfigurable Interconnects and Optical networks Simulation results on performance improvement Conclusions Outline
22
22 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City The interconnection network presents a significant bottleneck in large multiprocessor systems Reconfigurable interconnects can adapt the network to the traffic at any point in time An optical implementation has been proposed Through simulation, we measured the resulting speed up: up to 40% of latency reduction can be achieved Obtained speedups depend on the application, network size, and the reconfigurable network constraints Conclusions
23
23 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Acknowledgements wim.heirman@ugent.be Thank you for your attention !
24
24 PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS – Wim Heirman, ISEE’07 HoChiMinh City Inter-node distances
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.