Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic.

Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu

Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic for CUDA Multiple-Precision Arithmetics Implementation on GPUs Data Structure Optimization of Data on CUDA Example Experimental Result

Multiple-Precision Integer 32bit & 64bit System Multiple-Precision Integer GPU Computing & CUDA GPGPU CUDA

10 Based Integer Big Integer in System b is 2^32

Computing Capability Memory Bandwidth

L2 FB SP L1 TF Thread Processor Vtx Thread Issue Setup / Rstr / ZCull Geom Thread IssuePixel Thread Issue Input Assembler Host SP L1 TF SP L1 TF SP L1 TF SP L1 TF SP L1 TF SP L1 TF SP L1 TF L2 FB L2 FB L2 FB L2 FB L2 FB Streaming Multiprocessor (SM) Streaming Processor (SP)

CUDA: CPU + GPU C Parallel Computing modal Single instruction Multiple Thread (SIMT) All threads run the same function(1000s threads on the fly) Each core deal with different data Hidden the IO by multiple-threads(more than 1000s threads) Speed up Computing ／ IO Translation Coalesce the IO one time When half warp thread access neighboring data 1 cycle@GPU vs. ~1000 cycles@CPU

Background & Related Work. Multiple-Precision Arithmetic for CUDA Multiple-Precision Arithmetics Implementation on GPUs Data Structure Optimization of Data on CUDA Example Experimental Result

1. Multiple-precision Comparison 2. Multiple-precision Addition 3. Multiple-precision Subtraction 4. Multiple-precision Modular Addition 5. Multiple-precision Modular Subtraction

6. Multiple-precision Multiplication 7. Multiple-precision Division 8. Multiple-precision Montgomery Reduction 9. Multiple-precision Montgomery Multiplication 10.Barrett Modular Reduction Algorithm

11. Multiple-precision Multiplicative Inversion 12. Multiple-precision Montgomery Exponentiation 13. Montgomery Multi- Exponentiation 14. Multiple-precision Modular Addition …

Background & Related Work. Multiple-Precision Arithmetic for CUDA. Implementation on GPUs Data Structure Optimization of Data on CUDA Example Experimental Result

Two types of Data Structure Data Structure Using Cache memory with Constant Constant Value Using Shared memory for temp value Temp value Balance the threads and memory Balance Resource Data encoding Example

C = vectorA * Matrix B % prime

There is no cache for global memory on G80/G200 Constant memory & texture memory have little cache IO latency 400-600 clock cycles This is the bottle neck Key to Optimization!

Global memory access by threads in a half-warp can be coalesced When the words accessed by all threads lie in the same segment of size equal to: 32 bytes if all threads access 8-bit words 64 bytes if all threads access 16-bit words 128 bytes if all threads access 32-bit or 64-bit words Any pattern of addresses requested by the half- warp Including patterns where multiple threads access the same address

Address 0 Thread 0 Address 4 Address … Address 116 Address 120 Address 124 Address 128 Address … Address 172 Address 176 Address 180 Address 184 Address 188 Address 252 Thread 1 Thread 2 Thread 3 Thread … Thread 14 Thread 15 … Segment 0 (128B)Segment 1 (128B) Reduced to 32B Segment size is 32 bytes for 8-bit data, 64 bytes for 16-bit data, 128 bytes for 32-, 64- and 128-bit data.

Background & Related Work. Multiple-Precision Arithmetic for CUDA. Implementation on GPUs. Experimental Result

CPU: Intel® Core™ i7 CPU 860 @ 2.80 GHz (single thread) GPU: XFX GTX280, 1.24 GHz

Multiple-Precision 1 Arithmetic 2 GPU Computing & Optimization 3 Example & result 4 Summary

Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic.

Similar presentations

Presentation on theme: "Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic.

Similar presentations

Presentation on theme: "Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic."— Presentation transcript:

Similar presentations

About project

Feedback