Download presentation
Presentation is loading. Please wait.
Published byMargery Bryant Modified over 9 years ago
1
Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas PARAG@N – Parallel Architecture Group Northwestern University Team: Y. Demir, P. Yan, S. Song, J. Kim, G. Memik
2
Chip Power Scaling © Hardavellas 2 Chip power does not scale [Azizi 2010]
3
Voltage Scaling Has Slowed © Hardavellas 3 In last decade: 13x transistors but 30% lower voltage Cannot run all transistors fast enough
4
Pin Bandwidth Scaling © Hardavellas 4 [TU Berlin] Cannot feed cores with data fast enough to keep them busy
5
Data Scaling SPEC, TPC datasets growth: faster than Moore Same trends in scientific, personal computing Large Hadron Collider March’11: 1.6PB data (Tier-1) Large Synoptic Survey Telescope 30 TB/night 2x Sloan Digital Sky Surveys/day Sloan: more data than entire history of astronomy before it © Hardavellas 5 More data more computing power to process them
6
Galaxy: Optically-Connected Disintegrated Processors Physical constraints limit single-chip designs Area, Yield, Power, Bandwidth Multi-chip designs break free of these limitations Processor disintegration Macro-chip integration © Hardavellas 6 [Pan, WINDS 2010]
7
Outline Introduction ➔ Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 7
8
Nanophotonic Components © Hardavellas 8 off-chip laser source coupler resonant modulators resonant detectors Ge-doped waveguide Selective: couple optical energy of a specific wavelength
9
Modulation and Detection © Hardavellas 9 11010101 10001011 16 - 64 wavelengths DWDM 5 - 20μm waveguide pitch 10Gbps per link 8 Tbps/mm bandwidth density or more !!!
10
Outline Introduction Background ➔ Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 10
11
Optical Crossbar © Hardavellas 11
12
Routing Example © Hardavellas 12
13
Single Chiplet Connectivity © Hardavellas 13
14
Galaxy Architecture © Hardavellas 14
15
Galaxy MWSR Optical Crossbar © Hardavellas 15 More energy-efficient than SWMR at that scale MWSR avoids broadcast bus, but requires arbitration
16
© Hardavellas 16 SWMR vs. MWSR Crossbar Single-Writer Multiple-Reader Broadcast bus All receivers always read On-rings optical loss High laser power Multiple-Writer Single-Reader Only one receiver reads Only one ring is on low loss Low laser power Needs arbitration
17
Dense Off-Chip Coupling © Hardavellas 17 Dense optical fiber array [Lee, OSA/OFC/NFOEC 2010] ~3.8dB loss, 8 Tbps/mm demonstrated Misalignment within loss <1 dB Loss comparable to optical proximity couplers
18
Nanophotonic Parameters © Hardavellas 18
19
Outline Introduction Background Galaxy Architecture ➔ Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 19
20
Architectural Parameters © Hardavellas 20
21
Modeling Infrastructure © Hardavellas 21 3D-stack model SimFlex sampling 95% confidence photonic-layer ring heating
22
Outline Introduction Background Galaxy Architecture Experimental Methodology Results ➔ Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 22
23
Laser Power Sensitivity to Optical Parameters © Hardavellas 23 Coupler Loss Off-Ring Loss Waveguide & Filter Drop Loss Modulator Insertion Loss Highly sensitive to coupler loss, insensitive to other losses
24
Sensitivity to Fiber Density 116mm 2 chiplets 43mm along the chip edge Enough room for 172 fibers @ 250μm pitch © Hardavellas 24 128 fibers: within 3% of max performance
25
Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies ➔ Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 25
26
xxx_4MC: Unlimited power (max speed of design, irrespective of temp.) xxx_20MC: Also unlimited bandwidth (20 MCs per chip, 5x more pins) Performance Against “Unlimited” Designs © Hardavellas 26 Galaxy matches the performance of “unlimited” designs
27
Realistic: within power and bandwidth envelopes Galaxy chiplets within 66.2 o C chiplets run at max speed Performance Against Realistic Designs © Hardavellas 27 Galaxy: 2.4x - 3.2x speedup on average (3.4 max)
28
Cool chiplets minimize leakage Energy-Delay Product © Hardavellas 28 Galaxy: 2.4x-2.8x smaller EDP on average (7.1x max)
29
Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) ➔ Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude Overview of Other Research © Hardavellas 29
30
Comparison Against Multi-Chip Alternatives © Hardavellas 30 Fiber Galaxy: 2.5x speedup over Oracle Macrochip (6.8x max) 6x less laser power with demonstrated couplers
31
Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) ➔ Thermal Modeling Conclude Overview of Other Research © Hardavellas 31
32
80-core 5-chiplet Galaxy Thermal CFD Modeling © Hardavellas 32 8cm spacing allows cooling with cheap passive heatsinks 88.2 0 C
33
9-chiplet Dense Array (Oracle Macrochip) © Hardavellas 33 Tight arrangement points to liquid cooling requirement 249 0 C
34
9-chiplet Galaxy 2D © Hardavellas 34 Cooling 9 chiplets with passive heatsinks 110 0 C
35
9-chiplet Galaxy 3D © Hardavellas 35 Flexible fibers allow “virtual chip” to break free of 2D planar designs 83.6 0 C
36
Galaxy Summary “Virtual chips” with the performance of unlimited designs Breaks free of typical physical constraints Large aggregate area Improved yield (break-even point : 60% yield for photonics) Tb/s/mm bandwidth density Pushes back power wall Processor disintegration 2.4x – 3.2x avg. speedup (3.4 max) 2.4x – 2.8x avg. smaller EDP (7.1x max) Macrochip integration 2.5x speedup over Oracle Macrochip (6.8x max) 6x more power efficient links © Hardavellas 36
37
Outline Introduction Background Galaxy Architecture Experimental Methodology Results Sensitivity Studies Single-Chip Comparisons (Processor Disintegration) Multi-Chip Comparisons (Macrochip Integration) Thermal Modeling Conclude ➔ Overview of Other Research © Hardavellas 37
38
Integer add: 0.5pJ; FP-FMA: 50pJ. Where does energy go? Data movement: 1200pJ across 400mm 2 chip, 16000pJ memory Elastic caches: minimize data transfers through adapting caches to workload demands [ISCA’09, IEEEMicro’10, DATE’12] Processing: ~1500pJ to schedule the operation SeaFire: specialized computing on dark silicon to eliminate general- purpose computing’s overheads [IEEEMicro’11, USENIX-Login’11] Circuits: wide voltage guardbands Low voltages, process variation timing errors computing errors Elastic fidelity: allow errors at select code/data segments to save energy while maintaining fidelity contract with user [CoRR abs/1111.4279] Chips fundamentally limited by physical constraints. Need to break free. Galaxy: processor disintegration/macrochip integration using photonic interconnects [WINDS’10] Overall Focus: Energy-Efficient Computing
39
Thank You! © Hardavellas 39
40
BACKUP SLIDES © Hardavellas 40
41
Token-Based Arbitration © Hardavellas 41 8 cycles on average for token arbitration (5 chiplets)
42
Load-Latency Curves © Hardavellas 42 16 tokens provide optimal buffer depth
43
Comparison Against Multi-Chip Alternatives © Hardavellas 43
44
Tapered vs. Optical Proximity Couplers © Hardavellas 44 6x less laser power than Oracle Macrochip with demonstrated couplers
45
Energy is Shaping the IT Industry #1 of Grand Challenges for Humanity in the Next 50 Years [Smalley Institute for Nanoscale Research and Technology, Rice U.] Computing worldwide: ~408 TWh in 2010 [Gartner] Datacenter energy consumption in US ~150 TWh in 2011 [EPA] 3.8% of domestic power generation, $15B CO 2 -equiv. emissions ≈ Airline Industry (2%) Carbon footprint of world’s data centers ≈ Czech Republic Exascale @ 20MW: 200x lower energy/instr. (2nJ 10pJ) 3% of the output of an average nuclear plant! 10% annual growth on installed computers worldwide [Gartner] © Hardavellas 45 Exponential increase in energy consumption
46
Overcoming Data Movement and Processing Overheads Elastic caches: adapt cache to workload’s demands Significant energy on data movements and coherence requests Co-locate data, metadata, and computation Decouple address from placement location Capitalize on existing OS events simplify hardware Cut on-chip interconnect traffic by half Seafire: specialized computing on dark silicon Repurpose dark silicon to implement specialized cores Application cherry-picks a few cores, rest of chip is powered off Vast unused area many specialized cores likely to find good matches 12x lower energy (conservative) 46 © Hardavellas
47
Elastic fidelity: selectively trade accuracy for energy We don’t always need 100% accuracy, but HW always provides it Language constructs specify required fidelity for code/data segments Steer computation to exec/storage units with appropriate fidelity and lower voltage 35% lower energy Overcoming Voltage Guardbands 47 © Hardavellas No errors 10% errors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.