Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel White Noise Generation on a GPU via Cryptographic Hash

Similar presentations


Presentation on theme: "Parallel White Noise Generation on a GPU via Cryptographic Hash"— Presentation transcript:

1 Parallel White Noise Generation on a GPU via Cryptographic Hash
Stanley Tzeng Li-Yi Wei Microsoft Research Asia

2 What is White Noise? Spatial domain: uniform random number
Frequency domain: white noise spatial domain frequency domain

3 Importance Mother of all random numbers
Commonly used, e.g. rand() in C/C++ Major algorithms sequential e.g. xn = a xn-1 + b mod c Processors are becoming parallel GPU, multi-core CPU, Cell sequential algorithms cannot leverage that

4 Contribution ☺Parallel algorithm for white noises
independent evaluation for every sample easy implementation as a GPU pixel shader speed faster than sequential algorithms quality same or better usage similar to texture mapping

5 PRNG (Pseudo Random Number Generator)
The main source of randomness in programs Desirable properties white noise statistics repeatable fast computation low memory usage

6 borrow cryptographic hash!
Core Idea input trivially prepared in parallel, e.g. linear ramp feed input value into hash, independently and in parallel output white noise key idea: borrow cryptographic hash! input hash output

7 Hash (however nice) input → (unrecognizable) mess

8 Cryptographic Hash A subclass of hash
Commonly used for security applications e.g. password, digital signature Properties irreversible – cannot find input from hash output decorrelating – similar inputs, dissimilar outputs uniform probability – all outputs likely to occur

9 Cryptographic Hash - Example
irreversible, decorrelating, uniform probability CHash ("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6 CHash ("The quick brown fox jumps over the lazy eog") = ffd93f fbaef4da268dd0e

10 Cryptographic Hash as a PRNG
White noise statistics CHash is cryptographically secure Repeatable CHash is invariant with same input Fast computation CHash is parallel + constant cost Low memory usage CHash maintains no state Order-independent i.e. Random accessible important for parallel GPU applications hash

11 Which Cryptographic Hash?
Many options MD5, SHA, RIPEMD, Tiger, block cipher, etc Desirable properties white noise quality fast computation power-of-2 aligned (output & operations) pure pixel shader, no state maintenance

12 Our Hash of Choice: MD5 [Rivest 1992]
128-bit outputs and 32-bit operation Small number of constants fit entirely in shader Fastest among those satisfying quality criteria Not 100% secure [Wang and Yu 2005] but good enough for our goal

13 (bit op, table, arithmetic)
MD5 Algorithm Overview Scrambling (bit op, table, arithmetic) Input Output shift table sin table 64 rounds

14 Performance Bottlenecks for Pixel Shader
Scrambling (bit op, table, arithmetic) Input Output shift table sin table 64 rounds

15 (bit op, table, arithmetic)
Our Optimization Scrambling (bit op, table, arithmetic) Input Output reduced shift table shift table sin table sin function 64 rounds loop unrolling

16 Previous PRNG GPU CPU rand drand48
BBS [Blum et al. 1986, Olano 2005] O extremely fast X not good quality CEICG [Entacher et al. 1998, Sussman et al. 2006] O decent quality X processing time varies AES [NIST 2001, Yamanouchi ] O invertible (not hash) CPU rand O commonly used X not good quality drand48 O better quality X slower Mersenne Twister [Matsumoto and Nishimura 1998] O high quality and fast X not random accessible

17 Assessing Quality: DIEHARD [Marsaglia 1995]
De facto standard on measuring PRNG quality Runs 15 different tests on the bits generated Outputs p-val. If p == 0 || p == 1, fail. BIRTHDAY SPACINGS TEST, M= 512 N=2**24 LAMBDA= Results for aes.bin For a sample of size 500: mean aes.bin using bits 1 to duplicate number number spacings observed expected 6 to INF Chisquare with 6 d.o.f. = p-value=

18 Cumulative Distribution Function
Shows how data is distributed within set Given x in data, what % of data values are ≤ x 100% 100 % 0 % 0 % X=0 1 X=0 1 Normal Distribution Uniform Distribution

19 Kolmogorov-Smirnov Test
Determines how two sets of data are alike Looks at max difference D between distribution functions 100 % not alike 100 % alike D D 0 % 0 % X=0 1 X=0 1

20 Assessing Quality: DIEHARD
Run the results of the DIEHARD test (p-value) through a KS-test. Look at D-value. 100 Uniform Distribution Curve P-value Curve D-Value D Smaller D is better quality! Cumulative Distribution Function

21 Assessing Quality: Power Spectrum
Radial mean: should be uniform Radial variance: should be low & uniform Power spectrum density Radial mean Radial variance (Anisotropy)

22 Assessing Speed: Batch Rendering
Clock time to generate random bits n2 x 128 bits image, n = 512, 1024, 2048 and 4096 n2 n2

23 Assessing Speed: Texture Subset (For random accessibility)
A huge virtual texture clock time for access A B measure difference (smaller is better) 220 220 B

24 Test Results: DIEHARD Results
the higher the better the lower the better

25 Test Results: Power Spectrum Tests
MD5 M. Twister GPU BBS

26 Test Results: Batch Render Speed

27 Test Results: Texture Subset Speed

28 Trading Quality for Speed
Reducing # of rounds O faster speed X lower quality Rounds Time(ms) DIEHARD tests passed KS D-Val 64 6.3 15/15 0.2029 48 4.7 14/15 0.2042 32 3.1 13/15 0.2295 16 1.6 0.253

29 Applications Fractal terrain (vertex shader) Texture tiling
(fragment shader)

30 Future Work Implement our method in hardware Alternative hashes
very similar to texture unit but much smaller (no need for cache) Alternative hashes ride with advances in cryptographic hash

31 Thank You!


Download ppt "Parallel White Noise Generation on a GPU via Cryptographic Hash"

Similar presentations


Ads by Google