Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh

Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh
Constructing and Characterizing Covert Channels on GPGPUs Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh

Covert Channel Malicious indirect communication of sensitive data.
Why? There is no communication channel. The communication channel is monitored. Covert channel is undetectable by monitoring systems on conventional communication channels. Trojan Spy Covert Channel Gallery App Weather App

Covert channel are a substantial threat on GPGPUs
Trends to improve multiprogramming on GPGPUs. GPU-accelerated computing available on major cloud platforms No protection offered by an Operating system High quality (low noise) and Bandwidth

Overview Threat: Using GPGPUs for Covert Channels.
To demonstrate the threat: We construct error-free and high bandwidth covert channels on GPGPUs. Reverse engineer scheduling at different levels on GPU Exploit scheduling to force colocation of two applications Create contention on shared resources Remove noise Key Results: Error-free covert channels with bandwidth of over 4 Mbps.

GPU Architecture Intra-SM Channels: L1 constant cache, functional units and warp schedulers Inter-SM Channels: L2 constant cache, global memory

Colocate Spy and Trojan Construct the Channels Remove Noise
Attack Flow Colocate Spy and Trojan Construct the Channels Remove Noise

Colocation (Reverse Engineering the Scheduling)
Step 1: Thread block scheduling to the SMs Kernel 1 Kernel 2 TB0 TB1 TBn TB0 TB1 TBn GPU Thread Block Scheduler TB0 TB1 TBn TB0 TB1 TBn Leftover Policy SM0 SM1 SMm Interconnection Network L2 Cache and Memory Channels

Step 2: Warp to warp schedulers mapping
TB TB W0 W1 Wk-1 Wk W0 W1 Wk-1 Wk Warp Scheduler Warp Scheduler SMk TBi TBj Dispatch Unit Dispatch Unit Register File SP SP SP DP L/D SFU SP SP SP DP L/D SFU Shared Memory / L1 Cache

Cache Channel (intra-SM and inter-SM)
Extracting the cache parameters using latency plot. (cache size, number of sets, number of ways and line size) Communicating through one cache set. Spy Trojan Eviction of Spy data Send 0 Send 1 Cache misses Cache Hit Low Latency Constant Cache set x No Access! Higher Latency Constant Memory Spy Data Array (SD) Trojan Data Array (TD)

Synchronization: L1 Constant cache 1 Wait (ReadytoSend) Trojan
Wait (ReadytoReceive) 1 Spy 1 …011001 Receive 6 bits Thread 0-5 1 Thread 0-5 1

Synchronization and Parallelization
GPU SM 0 SM 1 SM n …

SFU and Warp scheduler Channel (intra-SM)
Limitation on number of issued operations in each cycle: Type and number of functional units. Issue bandwidth of warp schedulers Contention is isolated to warps assigned to the same warp scheduler. Kepler SM

SFU and Warp scheduler Channel (intra-SM)
Spy Trojan Base Channel Does operations to the target functional unit and measures the time. Low latency: “0” High latency: “1” Does operations to the target functional unit to create contention to send “1”. No operation to send “0”. Communicating different bits through warps assigned to different warp schedulers. Improved BW Parallelism at Warp Scheduler level Parallelism at SM level

Back Propagation Kmeans Heart Wall K-Nearest Neighbor … What about other concurrent applications co-located with spy and trojan? GPU SM …

Exclusive Colocation of Spy and Trojan
Concurrency limitations on GPU hardware (leftover policy): Shared Memory Register Number of Thread blocks Spy Trojan TB0 TB1 TBn TB0 TB1 TBn Prevented interference from Rodinia Benchmark workloads on covert communication and achieved error free communication in all cases. GPU SM Shared Memory Kmeans Back Propagation Heart Wall K-Nearest Neighbor … Spy Shared Memory … Register Register Trojan No Resource Left!

Results Error-free bandwidth of over 4 Mbps
L1 Cache Covert channel bandwidth on three generations of Real NVIDIA GPUs 12.9 x Error-free bandwidth of over 4 Mbps The fastest known micro-architectural covert channel under realistic conditions. 3.8 x 1.7 x

Results SFU Covert channel bandwidth on three generations of Real NVIDIA GPUs 13 x 3.5 x

Conclusion GPUs improved multiprogramming makes the covert channels a substantial threat. Colocation at different levels by leveraging thread block scheduling and warp to warp scheduler mapping. GPU inherent parallelism and specific architectural features provides very high quality and bandwidth channels; up to over 4Mbps error-free channel.

Thank You!

Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh

Similar presentations

Presentation on theme: "Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh

Similar presentations

Presentation on theme: "Hoda NaghibiJouybari Khaled N. Khasawneh and Nael Abu-Ghazaleh"— Presentation transcript:

Similar presentations

About project

Feedback