Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh

Similar presentations


Presentation on theme: "Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh"— Presentation transcript:

1 Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh
Constructing and Characterizing Covert Channels on GPGPUs Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh

2 Covert Channel Malicious indirect communication of sensitive data.
Why? There is no communication channel. The communication channel is monitored. Covert channel is undetectable by monitoring systems on conventional communication channels. Trojan Spy Covert Channel Gallery App Weather App

3 Covert channel are a substantial threat on GPGPUs
Trends to improve multiprogramming on GPGPUs. GPU-accelerated computing available on major cloud platforms No protection offered by an Operating system High quality (low noise) and Bandwidth

4 Overview Threat: Using GPGPUs for Covert Channels.
To demonstrate the threat: We construct error-free and high bandwidth covert channels on GPGPUs. Reverse engineer scheduling at different levels on GPU Exploit scheduling to force colocation of two applications Create contention on shared resources Remove noise Key Results: Error-free covert channels with bandwidth of over 4 Mbps.

5 GPU Architecture Intra-SM Channels: L1 constant cache, functional units and warp schedulers Inter-SM Channels: L2 constant cache, global memory

6 Colocate Spy and Trojan Construct the Channels Remove Noise
Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise

7 Colocation (Reverse Engineering the Scheduling)
Step 1: Thread block scheduling to the SMs Kernel 1 Kernel 2 TB0 TB1 TBn TB0 TB1 TBn GPU Thread Block Scheduler TB0 TB1 TBn TB0 TB1 TBn Leftover Policy SM0 SM1 SMm Interconnection Network L2 Cache and Memory Channels

8 Step 2: Warp to warp schedulers mapping
TB TB W0 W1 Wk-1 Wk W0 W1 Wk-1 Wk Warp Scheduler Warp Scheduler SMk TBi TBj Dispatch Unit Dispatch Unit Register File SP SP SP DP L/D SFU SP SP SP DP L/D SFU Shared Memory / L1 Cache

9 Colocate Spy and Trojan Construct the Channels Remove Noise
Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise

10 Cache Channel (intra-SM and inter-SM)
Extracting the cache parameters using latency plot. (cache size, number of sets, number of ways and line size) Communicating through one cache set. Spy Trojan Eviction of Spy data Send 0 Send 1 Cache misses Cache Hit Low Latency Constant Cache set x No Access! Higher Latency Constant Memory Spy Data Array (SD) Trojan Data Array (TD)

11 Synchronization: L1 Constant cache 1 Wait (ReadytoSend) Trojan
Wait (ReadytoReceive) 1 Spy 1 …011001 Receive 6 bits Thread 0-5 1 Thread 0-5 1

12 Synchronization and Parallelization
GPU SM 0 SM 1 SM n

13 SFU and Warp scheduler Channel (intra-SM)
Limitation on number of issued operations in each cycle: Type and number of functional units. Issue bandwidth of warp schedulers Contention is isolated to warps assigned to the same warp scheduler. Kepler SM

14 SFU and Warp scheduler Channel (intra-SM)
Spy Trojan Base Channel Does operations to the target functional unit and measures the time. Low latency: “0” High latency: “1” Does operations to the target functional unit to create contention to send “1”. No operation to send “0”. Communicating different bits through warps assigned to different warp schedulers. Improved BW Parallelism at Warp Scheduler level Parallelism at SM level

15 Colocate Spy and Trojan Construct the Channels Remove Noise
Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise

16 Back Propagation Kmeans Heart Wall K-Nearest Neighbor What about other concurrent applications co-located with spy and trojan? GPU SM

17 Exclusive Colocation of Spy and Trojan
Concurrency limitations on GPU hardware (leftover policy): Shared Memory Register Number of Thread blocks Spy Trojan TB0 TB1 TBn TB0 TB1 TBn Prevented interference from Rodinia Benchmark workloads on covert communication and achieved error free communication in all cases. GPU SM Shared Memory Kmeans Back Propagation Heart Wall K-Nearest Neighbor Spy Shared Memory Register Register Trojan No Resource Left!

18 Results Error-free bandwidth of over 4 Mbps
L1 Cache Covert channel bandwidth on three generations of Real NVIDIA GPUs 12.9 x Error-free bandwidth of over 4 Mbps The fastest known micro-architectural covert channel under realistic conditions. 3.8 x 1.7 x

19 Results SFU Covert channel bandwidth on three generations of Real NVIDIA GPUs 13 x 3.5 x

20 Conclusion GPUs improved multiprogramming makes the covert channels a substantial threat. Colocation at different levels by leveraging thread block scheduling and warp to warp scheduler mapping. GPU inherent parallelism and specific architectural features provides very high quality and bandwidth channels; up to over 4Mbps error-free channel.

21 Thank You!


Download ppt "Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh"

Similar presentations


Ads by Google