Download presentation
Presentation is loading. Please wait.
Published byΚασσιέπεια Μαρής Modified over 6 years ago
1
Faster File matching using GPGPU’s Deephan Mohan Professor: Dr
Faster File matching using GPGPU’s Deephan Mohan Professor: Dr.John Cavazos University of Delaware Clarification about title: Only NVIDIA GPUS. Extension to others in progress. Inclusion of partial file matching support. 11/15/2018 SAAHPC 2010
2
Presentation Outline Introduction MD6 Algorithm CUDA MD6
Experiments and Results Conclusion 11/15/2018 SAAHPC 2010
3
Introduction 11/15/2018 SAAHPC 2010
4
Introduction File matching Motivation
Indispensable in fields like forensics and Information security Robustness of hashing algorithms used Motivation Advent of GPU computing Faster file matching Faster Hashing algorithms 11/15/2018 SAAHPC 2010
5
Faster file matching Hashing Algorithms
MD -4, -5, SHA -1, -2, -256, -512, Tiger, Whirlpool Used in Integrity checking, checksum calculation, message authentication etc. Existing file matching programs SSDEEP HASHDEEP Tons of proprietary file matching programs 11/15/2018 SAAHPC 2010
6
The MD6 Algorithm 11/15/2018 SAAHPC 2010
7
Merkle Tree Computation proceeds from bottom up
Each leaf represents a data chunk Each intermediate node represents a compression node 11/15/2018 SAAHPC 2010
8
The MD6 Algorithm MD6 Inputs: MD6 Compression :
M the message to be hashed (mandatory). d message digest length desired, in bits (mandatory) . K key value (optional). r number of rounds (optional). MD6 Compression : MD6 word size 8 Bytes MD6 Buffer size 64 words (512 Bytes) Each Buffer is pre-processed f : W64 W89 W16 W16 is post processed The final hash is exactly d bits in length 11/15/2018 SAAHPC 2010
9
CUDA MD6 11/15/2018 SAAHPC 2010
10
CUDA MD6 Implementation
Step (i): Host buffers in the content of the source file Step (ii): Allocates adequate memory on the device Step (iii): Invoke kernels md6_compress_block() – Preprocessing module md6_compress() – Compression module md6_rewrite() – MD6 hash aggregation module Step (iv): Do step iii N+1 times to generate the final hash Step (v): Perform hash comparison Step (vi): Store hash in the hashdb Step (ii): If file size larger, chunk it. Step(iii): Sequential run of three kernels. Keep data in GPU memory during computational time without offloading it to the host. Improves performance. 11/15/2018 SAAHPC 2010
11
CUDA MD6 md6_compress_block() md6_compress() md6_rewrite()
Data preprocessing module f : W64 W89 <<<Grid, Threads>>> <<<Total number of buffers, 1>>> md6_compress() Performs MD6 compression <<<Grid, Threads>>> <<<Total number of buffers, 16>>> md6_rewrite() Performs MD6 Hash Aggregation f : W89 W16 <<<Grid, Threads>>> <<<(Total number of buffers/4), 1>>> Sequential run of three kernels 11/15/2018 SAAHPC 2010
12
Preprocessing kernel Transforms each MD6 buffer
15 words – constant vector (Q) 8 words - Key (K) U,V – unique control words Last 64 words – Data chunk A 15 word vector composed of constants (primes) Header Data chunk (64) 11/15/2018 SAAHPC 2010
13
CUDA MD6 Compression Kernel
For each block do Set index to blockID For each data[index] do Set i to n + ThreadID: /* 16 steps */ x = Si-n xor Ai-n xor Ai-t0 x = x xor (Ai-t1 ^ Ai-t2) xor (Ai-t3 ^ Ai-t4) x = x xor (x >> ri-n) x = x xor (x << ri-n) exit CUDA block exit CUDA kernel call 11/15/2018 SAAHPC 2010
14
CUDA MD6 Compression Coalesced memory reads and writes
Use of constant and shared memory within kernel Compression function loop unrolled Compression rounds Integrity Sliding window thread access 11/15/2018 SAAHPC 2010
15
CUDA MD6 Execution Thread block STEP 2: Compress data
Call md6_compress () <<< Total buffers, Threads>>> <<< 8, 16>>> STEP 1: Read in data Call md6_compress_block () <<< Total buffers, Threads>>> <<< 8, 1>>> Grid 11/15/2018 SAAHPC 2010
16
CUDA MD6 Execution STEP 3: Write hash into appropriate node
Call md6_rewrite() <<< Total buffers, Threads>>> <<< 2, 1>>> 11/15/2018 SAAHPC 2010
17
CUDA MD6 Execution STEP 1: STEP 2: Read in data Compress data
Call md6_compress () <<< Total buffers, Threads>>> <<< 2, 16>>> STEP 1: Read in data Call md6_compress_block () <<< Total buffers, Threads>>> <<< 2, 1>>> 11/15/2018 SAAHPC 2010
18
CUDA MD6 Execution Final step Write out the final hash
End of CUDA kernel STEP 3: Write hash into appropriate node Call md6_rewrite() <<< Total buffers, Threads>>> <<< 2, 1>>> 11/15/2018 SAAHPC 2010
19
CUDA MD6 for File matching
Absolute file matching Message digest is unique User can input predetermined set of hashes Comparison of Input hashes with GPU generated hashes File matching can be done in two modes Direct Hashing (Single files) Recursive Hashing (Archive of files) Hashing Larger Files Larger files are broken down into data chunks Each chunk is hashed and finally aggregated 11/15/2018 SAAHPC 2010
20
Experiments and Results
11/15/2018 SAAHPC 2010
21
Benchmarking platform
GPU Nvidia GeForce 8800 GTX card (112 cores) CUDA toolkit version 2.2 CPU Quad-core Intel Xeon E5335 CPU Sequential Iterative implementation of MD6 11/15/2018 SAAHPC 2010
22
Experiment 1: Executing CUDA MD6 on single files
11/15/2018 SAAHPC 2010
23
Experiment 2: Executing CUDA MD6 on archive of files
11/15/2018 SAAHPC 2010
24
Experiment 3: Executing CUDA MD6 with varying buffer sizes
11/15/2018 SAAHPC 2010
25
Number of compression rounds Vs Speedup
11/15/2018 SAAHPC 2010
26
Wall clock time Vs Kernel execution time
11/15/2018 SAAHPC 2010
27
Conclusion Speedup ranging from 2X to exceeding 250X
Performance degraders Host to device data transfer, Device initialization, Idle threads Faster hashing also depends on hash integrity Speedup should scale with increased number of GPU cores Point 2: 11/15/2018 SAAHPC 2010
28
Questions… 11/15/2018 SAAHPC 2010
29
Thank you!!! 11/15/2018 SAAHPC 2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.