Download presentation
Presentation is loading. Please wait.
1
Rachata Ausavarungnirun, Kevin Chang
Literature Survey Rachata Ausavarungnirun, Kevin Chang
2
Overview Problem Literature surveys Synthesis
3
Literatures Scott and Sohi. "The Use of Feedback in Multiprocessors and its Application to Tree Saturation Control." IEEE Transactions on Parallel and Distributed Systems 1990. Thottethodi et al., "Self-Tuned Congestion Control for Multiprocessor Networks", HPCA 2001. Ebrahimi et al., "Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems," ASPLOS 2010. Nychis et al., "Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need?" Hotnets 2010
4
Overall problem and idea
Applications are slowed down differently based on each applications’ characteristics Throttling back some applications can improve system throughput Can we make a system that can deliver high throughput and fairness at the same time? Better fairness Max Slowdown Ideal Weighted Speedup Better throughput
5
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control
Scott and Sohi. "The Use of Feedback in Multiprocessors and its Application to Tree Saturation Control." IEEE Transactions on Parallel and Distributed Systems 1990
6
Feedback for Tree Saturation Control
Problems: Tree Saturation can cause network bandwidth degradation Tree Saturation: a case where the memory intensive nodes inject more than what the network and memory can service These cores experience slowdown due to bandwidth contention Processors Mem. Modules
7
Feedback for Tree Saturation Control
Problems: Tree Saturation can cause network bandwidth degradation Tree Saturation: a case where the memory intensive nodes inject more than what the network and memory can service Goal: Control the hotness of memory modules to provide a memory bandwidth guarantee Key Idea: Use feedback loop to control the injection rate to the memory module
8
Feedback Control Processors Network Memories dest Hot?
Core 1 2 Core 2 Core 3 3 Core 4 1 Mod 1 N Mod 2 Mod 3 Mod 4 Determined base on the queue length Feedback
9
Feedback Control Processors Network Congested due to
requests to module 2 Memories dest Hot? Core 1 2 Core 2 Core 3 3 Core 4 1 Mod 1 N Mod 2 Y Mod 3 Mod 4 Feedback
10
Feedback Control Processors Network Congested due to
requests to module 2 Memories dest Hot? Core 1 2 Core 2 Core 3 3 Core 4 Mod 1 N Mod 2 Y Mod 3 Mod 4 These accesses will get held back Feedback
11
Tradeoffs Advantages Disadvantages Overhead
Decrease interference by disallow requests to hot memory module to congest the network Disadvantages Application unaware Does not prioritize latency-critical over bandwidth-critical Additional Complexity in the processor-network interface in order to change the threshold dynamically Overhead Adding the damping feedback system will increase the complexity of the processor-network interface Storage overhead
12
Self-Tuned Congestion Control for Multiprocessor Networks
M. Thottethodi, A. Lebeck, S. Mukherjee. “Self-Tuned Congestion Control for Multiprocessor Networks.” In HPCA 2010, Jan 2001.
13
Problem Tree of saturation: Goals:
Multiprocessor interconnection networks suffer from tree of saturation under heavy/intensive workloads. Goals: Estimate congestion level. Throttle the packet injection of all nodes if congestion level > threshold.
14
Mechanism Congestion Estimation:
Fraction of full VC buffers in the network. How to distribute VC buffers information? Piggy-back. Broadcast meta/control packets. Side-band. Overhead: Hardware: Extra channels and counters. Latency: for k-ary,n-cube networks. Hard to guarantee all-to-all comm. Contend BW and add more congestion! Aggregation
15
Mechanism Key: No single threshold works well for all traffic patterns. How to self-tune threshold? Hill Climbing algorithm to tune the threshold periodically. Tuning decision is made based on the observation of: Changes in network throughput. Current network throttle state. Constant tuning: Increment 1% Decrement 4% Threshold tuning decision table Saturation or lowered inj rate
16
Results Dashed line
17
Tradeoffs Advantages: Disadvantages: Prevents network saturation.
Global congestion estimation. Threshold is dynamically tuned. Disadvantages: Hardware overhead. Mechanism only works on virtual channel networks. Uniform throttling all the nodes.
18
Fairness via Source Throttling A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems Ebrahimi et al., "Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems," ASPLOS 2010.
19
Observation, goal and key idea
Observation: Slowdown of each applications depends on other applications behavior Zeus is slowed down a lot because of the interference from art
20
Observation, goal and key idea
Observation: Slowdown of each applications depends on other applications behavior Goal: Enable a fair memory system by detecting and controlling interference dynamically Key Idea: Dynamically estimate unfairness by detecting interferences at LLC, row buffer and the bus Control the interference by throttling requests generated by the interfering applications A system with FST Typical system
21
Unfairness Evaluation
Methodology Time Fairness goal = 1.5 Unfairness Unfairness Evaluation Request Throttling Slowest core Interfering core Throttling Level Core 0 Core 1 Core 2 Core 3 Interval 1 Interval 2 Interval 3
22
Unfairness Evaluation
Methodology Time Interval 1 Fairness goal = 1.5 Unfairness 3 Unfairness Evaluation Request Throttling Slowest core Core 2 Interfering core Core 0 Throttling Level Core 0 Core 1 Core 2 Core 3 Interval 1 50% 100% 10% Interval 2 Interval 3
23
Unfairness Evaluation
Methodology Time Interval 1 Interval 2 Fairness goal = 1.5 Unfairness 2.5 3 Unfairness Evaluation Request Throttling Slowest core Core 2 Core 2 Interfering core Core 1 Core 0 Throttling Level Core 0 Core 1 Core 2 Core 3 Interval 1 50% 100% 10% Interval 2 25% Interval 3
24
Unfairness Evaluation
Methodology Time Interval 1 Interval 2 Interval 3 Fairness goal = 1.5 Unfairness 2.5 Unfairness Evaluation Request Throttling Slowest core Core 2 Interfering core Core 1 Throttling Level Core 0 Core 1 Core 2 Core 3 Interval 1 50% 100% 10% Interval 2 25% Interval 3
25
Overhead and Tradeoffs
Hardware Cost: ~12 KB storage required to manage 4 cores Advantages: Gain better fairness and throughput by throttle back the interfering applications The mechanism takes into account of the interference caused by several places (ie. LLC, row buffer and the bus) Disadvantages: This mechanism does not take into account of the network interference
26
Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need?
George Nychis, Chris Fallin, Thomas Moscibroda, Onur Mutlu. “Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need?.” In Hotnets 2010, Monterey, CA, Oct 2010.
27
Observation/Solution
Observation: A bufferless network saturates more quickly than a buffered network. Lower-tolerance under high-intensive applications that cause system throughput degradation. Solution: Applications should be throttled to reduce congestion in the network. 17%
28
Key Idea Key Insight: Different applications have different level of sensitivity to throttling. 25% improvement! Non-intensive Intensive Need application-aware throttling!
29
Mechanism How to identify intensive applications?
Instructions-per-Flit (IPF) = Lower IPF means higher network intensity/flits injection rate. High intensive applications: When to start throttling? Starvation rate per node = Congested = any node’s starvation rate > threshold Therefore a node is throttled when: Benchmark IPF mcf 0.583 povray 1189.8
30
Results Overall system throughput( ) improvement on
4x4 and 8x8 networks. Up to 27.6% improvement!
31
Tradeoffs Advantages Disadvantages Application aware throttling.
Simple metrics to measure congestion and application intensity. Disadvantages Sampling every 100,000 could be too coarse grain. Not able to capture congestion if applications are bursty. Fairness Once identified as an intensive app, it will always be throttled within that interval (100K cycles) without some guarantee on progress.
32
Project Overview Problem: Bufferless NoCs perform poorly under heavy workloads Goal: Make bufferless NoCs perform comparably to buffered NoCs at high load Solutions: Throttle back network intensive applications to reduce interference
33
Synthesis Tree Sat. Self Tune FST HotNet Target Resource
Network Bandwidth Memory Bandwidth Metric used to determine congestion Queue length at memory modules Fraction of full virtual buffer Slowdown Starvation Who get throttled? Throttling sources that cause hotspots Uniform throttling Memory intensive applications Network intensive applications Application aware? N No Only Traffic aware Y Improve Fairness? Related to our project Identifying the same problem Self tuning threshold Provide fairness Trying to achieve the same goal
34
Thank you!
35
BLESS vs. Buffered Strengths Weaknesses
Reduction in network area and energy. Simple router design. Minimal performance reduction under light workloads. Weaknesses Poor performance under high-intensive workloads. 17%
36
Our project overview Problem: Bufferless NoCs perform poorly under heavy workloads Goal: Make bufferless NoCs perform comparably to buffered NoCs under high-intensive workloads Proposed Solutions: throttle back memory intensive applications
37
Problem A bufferless network saturates more quickly than a buffered network. Almost zero-tolerance under high-intensive applications that cause huge system throughput degradation. How to identify congestion? Use injection starvation as a metric to measure congestion. Starvation rate per node =
38
Quick summary on the mechanism
Kicks in every 100,000 cycles. Detects congestion level based on starvation. Determines IPF for each application. Low IPF -> high network intensity. If , throttles application that has low IPF! Overhead? Why throttle high-network intensive applications? Throttling these applications are most effective on traffic reduction.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.