Download presentation
Presentation is loading. Please wait.
Published byHandoko Tanudjaja Modified over 6 years ago
1
APPROX-NoC: A Data Approximation Framework for Network-On-Chip Architectures
Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, Eun Jung Kim
2
Leveraging inaccuracy to provide high throughput NoC
Motivation Perfect accuracy is not required Computer vision Machine learning Graph processing Large amount of data movement across NoC Video frame Neuron weights Graph weights Leveraging inaccuracy to provide high throughput NoC
3
Hardware Approximation
Compute Approximation Variable voltage based ALUs [Esmaeilzadeh et al. ASPLOS’12] Analog based circuit designs [St. Amant et al. ISCA’14] Neural network acceleration [Esmaeilzadeh et al. MICRO’12, Moreau et al. HPCA’15] Storage Approximation Approximate main memory [Sampson et al. MICRO’13, Liu et al. ASPLOS’11] Approximate cache [San Miguel et al. MICRO’15, MICRO’16] No previous research on approximation in NoCs
4
Approximation in NoCs Why do we need approximation in NoCs?
Higher throughput Mitigate memory bandwidth bottleneck Approximation increase data similarity to improve compression rate Leveraging inaccuracy tolerance of applications to improve effective bandwidth
5
APPROX-NoC
6
Main Idea Cache block 0xA 0xB 0xC 0xD 0xE 0xF VAXX Source
Approximated block 0xA 0xB 0xE 0xD Compr Network Network Representation e0+0xA e1 e2 e0+0xD Decompr Destination Decompressed block 0xA 0xB 0xE 0xD e0 uncompressed e1 0xB e2 0xE Uncompressed Precise Encoding/Decoding Approximate Encoding
7
Should be a Light-Weight Design
Challenges Value approximation and compression not cheap Latency overhead (on the critical path) Hardware cost Quality control is important Error calculation for every word Power and latency overhead for error compute Should be a Light-Weight Design
8
APPROX-NoC Architecture Overview
Tile Tile …… NI … NI NI … NI … Router Router Ejection Q NI Core To Processor or MC Eject Inject From Processor or MC Injection Q
9
APPROX-NoC Architecture Overview
Tile Tile …… NI … NI NI … NI … Router Router Ejection Q NI Core To Processor or MC Eject Decompr Inject Compr From Processor or MC Injection Q
10
APPROX-NoC Architecture Overview
Tile Tile …… NI … NI NI … NI … Router Router Ejection Q NI Core To Processor or MC Eject Decompr Approx? Inject Compr VAXX From Processor or MC Injection Q Approximate to similar data to improve compression rate.
11
APPROX-NoC Operation Flow Chart
Compressor Cache Block Approximable? Y Approximate Logic Int or float? Mantissa extraction N float int Approximate Value Compute Logic (AVCL) Data type aware approximation Bypass approximation logic to reduce overhead on critical path Seamlessly integrated with compression unit in plug-and-play manner
12
Integer Approximation Datapath
Simple for integer The complete word passed for approximation Abstraction u Calculate the error budget based on the threshold v Detect number of bits for the error budget, e.g. n bits w Approximate least significant (n-1) don’t care bits for compression-friendly data patterns 31 integer Approximate Logic 31 Approximated integer
13
Floating-Point Approximation Datapath
Representation IEEE 754 (−1)sign × (1 + .mantissa) × 2(exponent−bias) Abstraction sign exponent mantissa 31 s exponent mantissa No Floating-Point Operation for FP Approximation u Extract the mantissa bits and normalized as an integer v Approximate like integer w Concatenate exponent to recover approximate float value u 24 23 0 …….. 0 1 mantissa v Approximate Logic 0 …….. 0 1 approx mantissa w s exponent approx mantissa
14
Approximate Value Compute Logic (AVCL)
31 one word data Unified logic for both integer and floating point Fast error budget compute e: error threshold (0-100) error_budget = given_value × (e/100) = given_value/(100/e) 100/e predefined (100/25 = 4 = B’100) Only shifting bits 32 23 24 23 0 ……. 0 1 mantissa 32 32 int/float? 8 Approximate Logic Float Exponent Detection 32 9 9 int/float? 23 32 int/float? 23 32 approx?
15
APPROX-NoC Implementation Cases
Plug VAXX approximate engine with compression units Frequent pattern compression (FP-COMP) [Das et al. HPCA’08] Dictionary-based compression (DI-COMP) [Jin et al. MICRO’08] Frequent Pattern Based VAXX (FP-VAXX) Approximate the value Compressed approximated pattern Dictionary-Based VAXX (DI-VAXX) Use TCAM to store approximated tracked patterns Approximation off the critical path
16
Frequent Pattern VAXX (FP-VAXX)
Given word Approximate Value Compute Logic (AVCL) Error threshold Frequent Pattern Compressor Approximate pattern Encoded pattern First approximate the value with AVCL Compressed the approximate pattern using frequent pattern compression
17
Dictionary-Based VAXX (DI-VAXX)
Update Approximate Value Compute Logic (AVCL) Fill and Update Error threshold Approx Pattern Encoded Idx 010X e0 1001 10XX 10XX e1 1010 Given word Lookup Match? Use TCAM to store approximated patterns Precompute approximate patterns while update and fill the dictionary Approximation off critical path Encoded index e1
18
Evaluation
19
Methodology Workloads Architecture NoC Tools Parsec 3.0
SSCA2 graph application Synthetic workload from benchmark traces Architecture 32 Out-of-Order cores at 2 GHz 32 KB L1I$ and 64 KB L1D$, 2-way 2 MB L2-bank and 16 directories NoC 4x4 2D concentrated-mesh 2 GHz, 3-stage router 4 virtual channels, 4-flit buffer 64-bit flit, X-Y routing Tools Gem5 for full system performance Pin-based simulator for application output error In house NoC simulator for synthetic study
20
Packet Latency and Data Quality
Synthetic study: benchmark traces permutations, 75% approximable data packet and 10% error threshold
21
Packet Latency and Data Quality
Synthetic study: benchmark traces permutations, 75% approximable data packet and 10% error threshold
22
Packet Latency and Data Quality
Synthetic study: benchmark traces permutations, 75% approximable data packet and 10% error threshold DI-VAXX reduces latency by 11% and 40% compared to DI-COMP and Baseline FP-VAXX reduces latency by 21% and 46% over FP-COMP and Baseline For data intensive benchmark SSCA2, DI-VAXX outperforms DI-COMP by 22%, FP-VAXX outperforms FP-COMP by 36%
23
Packet Latency and Data Quality
Synthetic study: benchmark traces permutations, 75% approximable data packet and 10% error threshold DI-VAXX reduces latency by 11% and 40% compared to DI-COMP and Baseline FP-VAXX reduces latency by 21% and 46% over FP-COMP and Baseline For data intensive benchmark SSCA2, DI-VAXX outperforms DI-COMP by 22%, FP-VAXX outperforms FP-COMP by 36% Data value quality is higher than 97% (< 3% error)
24
Compression Ratio Synthetic study: benchmark traces permutations, 75% approximable data packets and 10% error threshold Approximation can improve compression ratio up to 41% DI-VAXX and FP-VAXX improve compression ratio by 10% and 30% in geomean Higher compression ratio reduces flits, thus reduces queuing and contention
25
Throughput - Uniform Random
Synthetic study: Streamcluster traces permutations, 1: data to control packet ratio 75% approximable data packets and 10% error threshold3 VAXX improves the throughput by up to 40%
26
Throughput - Transpose
Synthetic study: Streamcluster traces permutations, 1:3 data to control packet ratio 75% approximable data packets and 10% error threshold VAXX improves the throughput by up to 69%
27
Application Error and Full System Performance
Application errors are less than 5% except for streamcluster and swaptions
28
Application Error and Full System Performance
Application errors are less than 5% except for streamcluster and swaptions performance is improved by up to 10% and 14% in swaptions and SSCA2
29
Power Consumption and Area Overhead
Approximation power consumption is compensated by flit reduction Schemes DI-VAXX FP-VAXX Area Overhead (45 nm) mm2 mm2
30
Conclusions NoC data approximation framework for leveraging inaccuracy to provide high throughput. Light-weight Approximate Compute to support both integer and floating-point. Low cost microarchitecture implementations of VAXX. APPROX-NoC achieves up to 21% average packet latency reduction and 69% throughput improvement.
31
Thank You & Questions Jiayi Huang
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.