Fast vector arithmetic over GF(3) Kris Coolsaet Department of Applied Mathematics and Computer Science, Ghent University, Belgium Fq10 – 12/07/2011
Basic idea Use bit-parallellism of CPU to do up to 64 computations at the same time Represent each element of F 3 with 2 bits Binary operations simulate operations on F 3 Notation (binary): b 1 + b 2 bitwise addition = exclusive or b 1 b 2 bitwise multiplication = and b 1 | b 2 bitwise or
Benchmarks Execution speed compared to standard methods that use arithmetic modulo 3 Programmed in C On a 64 bit processor (On a 32 bit processor you lose a factor of 2 when vector lengths are > 32)
Benchmark 1 Compute the rank of a square matrix
Benchmark 2 Compute the Hamming distance between two code words
Benchmark 3 Compute the dot product of two vectors
Benchmark 4 Generate all elements in a subspace spanned by 8 given vectors
Bit representation Element v of F 3 represented as (v 1,v 2 ) of F 2 x F 2 0 as (1,1) 1 as (0,1) 2 as (1,0) Vectors of F 3 n represented as pairs of words v v1v1 v2v
Some basic operations Negation if r = -v then r 1 =v 2 r 2 =v 1 Multiplication (elementwise) if r = v w then r 1 =v 1 w 2 | v 2 w 1 r 2 =v 1 w 1 | v 2 w 2 needs 6 binary operations
Addition and subtraction Addition r 1 = (v 2 + w 2 ) | (v 1 + v 2 + w 1 ) = (v 2 + w 2 ) | (v 1 + w 1 + w 2 ) r 2 = (v 1 + w 1 ) | (v 1 + v 2 + w 2 ) = (v 1 + w 1 ) | (v 2 + w 1 + w 2 ) needs 6 binary operations Subtraction r 1 = (v 2 + w 1 ) | (v 1 + v 2 + w 2 ) = (v 2 + w 1 ) | (v 1 + w 1 + w 2 ) r 2 = (v 1 + w 2 ) | (v 1 + v 2 + w 1 ) = (v 1 + w 2 ) | (v 2 + w 1 + w 2 ) Combined Needs only 10 operations
Dot product Shift right Cop y Mask Subtract binary 1Remainder after division by 3 weightmod3(v i ): number of 1-bits in v i modulo 3
Dot product Multiply v, w elementwise (6 operations) Sum of the elements of the result r = weightmod3(r 2 )- weightmod3(r 1 ) 3 divisions: (b mod 3 – a mod 3 + 3) mod 3 Better (2a + b) mod 3
Weight / Hamming distance Shift right Cop y Mask (ternary) weight of v = (binary) weight of v 1 + v Subtract binary Shift right Mask Add binary Continue: shift by 4 and add, shift by 8...
Iterate over 3 n vectors subtract Each step takes 3 operations
64 = Assume length ≤ 32 Represent v as two words (v 1,v 2 ), (v 2,v 1 ) Addition (v 1,v 2 )+(w 1,w 2 ) → (t 1,t 2 ) (t 1,t 2 )+(v 2,v 1 ) → (u 1,u 2 ) (t 2,t 1 ) | (u 1,u 2 ) → (r 1,r 2 ) 5 operations instead of 6
Other structures Similar tricks should work for vectors over other small structures Field of 4 elements (trivial?) Ring Z/4Z Field of 5,7 elements (fast enough?) Relevant combinations of binary operations can be found by (exhaustive) computer search
Thank you