Download presentation
Presentation is loading. Please wait.
Published byMartha Helder Modified over 10 years ago
1
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips Andrew DeOrio †, Konstantinos Aisopos ‡§ Valeria Bertacco †, Li-Shiuan Peh § DAC 2011 † University of Michigan ‡ Princeton University § Massachusetts Institute of Technology
2
Reliable Networks on Chip 2 R up $ processor cache router Detect if fault has occurred Diagnose what fault has occurred Recover and resume normal operation Reconfigure network to account for fault Drain fault-tolerant routing detectiondiagnosis recon- figuratio recovery nodes are disconnected, state is lost!
3
Recovery Approaches Checkpoint/recovery approaches Drain takes a reactive approach, incurring performance overhead only when errors occur 3 R uP $ checkpoint buffers data stuck in checkpoint buffer! R uP $ MEM high performance overhead!
4
Data Recovery with Drain Recover data lost during reconfiguration – Emergency links provide alternate path – Transfers cache contents and architectural state primary link Mem uP $ Router uP $ uP $....................................... processor core local cache memory controller DRAIN emergency link 4
5
Drain Example up $ M $ $ $ 5
6
Drain Example up $ M $ $ $ X link failure 6
7
Drain Example up $ M $ $ $ 7 reconfigure interconnect
8
Drain Example up $ M $ $ $ X link failure 8
9
Drain Example up $ M $ $ $ 9 node isolated!
10
Drain Example up $ M $ $ $ drain connected nodes via primary links 10
11
Drain Example up $ M $ $ $ drain disconnected node via emergency link 11
12
Drain Example up $ M $ $ $ drain connected node again 12
13
Drain Example up $ M $ $ resume normal operation 13 up $
14
Drain Performance as Links Fail 14 increasing emergency link time decreasing functional network size
15
Memory Latency Before and After 15
16
Conclusions DRAIN is a lightweight recovery mechanism for CMPs – 5,000 gates per node Recoup cache data and architectural state from disconnected nodes Performance overhead only during recovery – ~3ms at 1GHz 16
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.