Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh Constructing and Characterizing Covert Channels on GPGPUs Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh
Covert Channel Malicious indirect communication of sensitive data. Why? There is no communication channel. The communication channel is monitored. Covert channel is undetectable by monitoring systems on conventional communication channels. Trojan Spy Covert Channel Gallery App Weather App
Covert channel are a substantial threat on GPGPUs Trends to improve multiprogramming on GPGPUs. GPU-accelerated computing available on major cloud platforms No protection offered by an Operating system High quality (low noise) and Bandwidth
Overview Threat: Using GPGPUs for Covert Channels. To demonstrate the threat: We construct error-free and high bandwidth covert channels on GPGPUs. Reverse engineer scheduling at different levels on GPU Exploit scheduling to force colocation of two applications Create contention on shared resources Remove noise Key Results: Error-free covert channels with bandwidth of over 4 Mbps.
GPU Architecture Intra-SM Channels: L1 constant cache, functional units and warp schedulers Inter-SM Channels: L2 constant cache, global memory
Colocate Spy and Trojan Construct the Channels Remove Noise Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise
Colocation (Reverse Engineering the Scheduling) Step 1: Thread block scheduling to the SMs Kernel 1 Kernel 2 TB0 TB1 TBn TB0 TB1 TBn GPU Thread Block Scheduler TB0 TB1 TBn TB0 TB1 TBn Leftover Policy SM0 SM1 SMm Interconnection Network L2 Cache and Memory Channels
Step 2: Warp to warp schedulers mapping TB TB W0 W1 Wk-1 Wk W0 W1 Wk-1 Wk Warp Scheduler Warp Scheduler SMk TBi TBj Dispatch Unit Dispatch Unit Register File SP SP SP DP L/D SFU SP SP SP DP L/D SFU Shared Memory / L1 Cache
Colocate Spy and Trojan Construct the Channels Remove Noise Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise
Cache Channel (intra-SM and inter-SM) Extracting the cache parameters using latency plot. (cache size, number of sets, number of ways and line size) Communicating through one cache set. Spy Trojan Eviction of Spy data Send 0 Send 1 Cache misses Cache Hit Low Latency Constant Cache set x No Access! Higher Latency Constant Memory Spy Data Array (SD) Trojan Data Array (TD)
Synchronization: L1 Constant cache 1 Wait (ReadytoSend) Trojan Wait (ReadytoReceive) 1 Spy 1 …011001 Receive 6 bits Thread 0-5 1 Thread 0-5 1
Synchronization and Parallelization GPU SM 0 SM 1 SM n …
SFU and Warp scheduler Channel (intra-SM) Limitation on number of issued operations in each cycle: Type and number of functional units. Issue bandwidth of warp schedulers Contention is isolated to warps assigned to the same warp scheduler. Kepler SM
SFU and Warp scheduler Channel (intra-SM) Spy Trojan Base Channel Does operations to the target functional unit and measures the time. Low latency: “0” High latency: “1” Does operations to the target functional unit to create contention to send “1”. No operation to send “0”. Communicating different bits through warps assigned to different warp schedulers. Improved BW Parallelism at Warp Scheduler level Parallelism at SM level
Colocate Spy and Trojan Construct the Channels Remove Noise Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise
Back Propagation Kmeans Heart Wall K-Nearest Neighbor … What about other concurrent applications co-located with spy and trojan? GPU SM …
Exclusive Colocation of Spy and Trojan Concurrency limitations on GPU hardware (leftover policy): Shared Memory Register Number of Thread blocks Spy Trojan TB0 TB1 TBn TB0 TB1 TBn Prevented interference from Rodinia Benchmark workloads on covert communication and achieved error free communication in all cases. GPU SM Shared Memory Kmeans Back Propagation Heart Wall K-Nearest Neighbor … Spy Shared Memory … Register Register Trojan No Resource Left!
Results Error-free bandwidth of over 4 Mbps L1 Cache Covert channel bandwidth on three generations of Real NVIDIA GPUs 12.9 x Error-free bandwidth of over 4 Mbps The fastest known micro-architectural covert channel under realistic conditions. 3.8 x 1.7 x
Results SFU Covert channel bandwidth on three generations of Real NVIDIA GPUs 13 x 3.5 x
Conclusion GPUs improved multiprogramming makes the covert channels a substantial threat. Colocation at different levels by leveraging thread block scheduling and warp to warp scheduler mapping. GPU inherent parallelism and specific architectural features provides very high quality and bandwidth channels; up to over 4Mbps error-free channel.
Thank You!