CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.

CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky

CS717 Problem at Hand Have matrices A and B Want to compute their product: AB Ask a matrix-matrix-multiply (MMM) implementation to compute product Answer: C Question: Is C the correct answer? How could we know for sure?

CS717 Algorithm-Based Fault Tolerance Encode input matrices via error-correcting code Run regular MMM algorithm on encoded matrices –Encoding invariant under MMM Naturally outputs encoded matrices Encoding guarantees: –If upto t errors in output, will detect error –If upto c<t errors in output, can decode correct output matrix

CS717 Outline Linear Error Correcting Codes Algorithm-Based Fault Tolerance ABFT = Linear Encoding of Matrices

CS717 Error Correcting Codes Map f:  k   n –k-long data words  n-long codewords –We use  ={0, 1} Code of length n is a “sparse” subset of  n –Very few possible words are valid codewords Rate of code Amount of information communicated by each codeword

CS717 Minimum Distance Minimum Distance: d() = Hamming distance Hamming distance: number of spots where words differ Measures difficulty of decoding/correcting corrupted codewords

CS717 Detection and Correction Code may detect errors in  d min spots –No error can morph one codeword into another May correct errors in  (d min -1)/2 spots –Can still find “closest” codeword More details later… Each codeword defines circle around itself of radius d min /2

CS717 Linear Codes Codewords form linear subspace inside  n In rowspace of generator matrix G: a (n=7, k=3) code

CS717 Property 1 Linear combination of any codewords is also a codeword: For any x,y  C, (x+y)  C Codeword*constant is codeword For any z  C, k*z  C always a codeword Proof: basic properties of linear spaces

CS717 Property 2 Minimum distance of linear code = Where Proof:

CS717 Parity Check Matrix H: dual matrix to G –Contains basis of space orthogonal to G’s row space –n-k dimentional space H is (n-k)xn Space defined as: Note: H also defines a linear code

CS717 Property 3 d min =min # of columns of H that can sum to 0 Proof:

CS717 Property 4 Minimum distance of linear code  n-k+1 Proof –Total n dimensions (since codewords are n-vectors) –G’s rowspace rank = k –Thus, H’s columspace rank = n-k –Thus, n-k+1 columns will be linearly dependent Add up to 0 –By Property 3, this is  d min

CS717 Encoding a Matrix Algorithm-Based Fault Tolerance introduced by Huang and Abraham in 1984 Encode each row of matrix via extra column Column entries = sums of matrix rows

CS717 Encoding a Matrix Encode each column of matrix via extra row Row entries = sums of matrix columns Full Encoding:

CS717 Detecting Errors Suppose matrix A is corrupted to matrix Â –entry â i,j is wrong Can detect error’s exact position:

CS717 Correcting Errors Can correct error using row or col checksum

CS717 Big Trick: Preservation of Encoding Column-encoded mtx * Row-encoded mtx = = Fully-encoded mtx Can check MMM computation by checking encoding of output If product matrix has an erroneous entry –Can detect –Can correct

CS717 Applications Matrix Multiplication –Given encoded A and B, –Check whether MMM result C (?=AB) has valid encoding Matrix Factorization –Given a factorization A=WZ –Verify correctness by verifying encodings of factors Factors row- OR column-encoded Can only detect, not correct errors

CS717 Weighted ABFT Oftentimes need to check row- or column- encoded matrices –Ex: factorization, data integrity check Can only detect errors in such matrices Can we also correct? Yes, by generalizing to weighted checking rows/columns

CS717 Weighting Suppose we have d n-vectors w 1 …w d Can column-encode matrix A: Lets try out:

CS717 Weighted Error Detection

CS717 Weighted Error Correction Weighted encoding Detects and Corrects single errors –Even for non full-encoding

CS717 “Surprise” But this is all just a linear code! Generator matrix for above scheme:

CS717 Generating Encodings Given m= as message word (or matrix row/column)

CS717 Surprise?? Not too surprising really Why else would MMM preserve encoding? Another possibility: –Efficient: can be implemented via bit shifts Room open for using any linear code!

CS717 Error Detection/Correction in General To show for linear codes: –Can detect  d min errors –Can correct  (d min -1)/2 errors Let be original codeword Let be the corrupted codeword –e: error vector

CS717 Error Detection in General –s called the “syndrome vector” –Independent of original codeword Note: weight(e) <d min since <d min errors Thus: Detection: if, then ERROR

CS717 Error Correction in General Clearly e is correction vector – corrects error in Sufficient to prove: weight(e)  (d min -1)/2  H is isomorphism: correction vectors  syndrome vectors –i.e. for each correction vector (want to know)  unique syndrome vector Thus, possible to correct any error –may not be efficient

CS717 H is Onto weight(e)  (d min -1)/2 < d min rank(H) = n-k  (d min -1)/2 Thus, rank(H)  weight(e) and He  0 –Not enough 1’s in e to sum H’s columns to 0 H maps onto its range Thus,

CS717 H is 1-1 Let e 1 and e 2 be correction vectors, e 1  e 2 Suppose that: –weight(e 1 &e 2 )  (d min -1)/2 –He 1 = He 2 = s He 1 -He 2 = H(e 1 -e 2 ) = s-s = 0 And so, (e 1 -e 2 ) is a codeword Thus, weight(e 1 -e 2 )  d min But weight(e 1 &e 2 )  (d min -1)/2 and so weight(e 1 -e 2 )  d min -1 Contradiction! e 1 = e 2

CS717 Other Encoding Schemes Linear codes preserved by matrix multiplication Presumably, fancier codes might be preserved by fancier computations Limit: –S. Winograd showed in 1962 that any code s.t. f(x  y) = f(x)  f(y) has rate (k/n) or minimum weight  0 as k  How general can we get? Do good solutions exist for small k? –k=64 bits should be good enough

CS717 Summary For Matrix Multiplication can encode input via linear codes Solutions exist for more complex codes –Ex: Fourier Transforms On parallel systems must ensure: –No processor touches >1 element per row/column –Else, if one processor fails, encoding overwhelmed with errors –To ensure this must modify algorithm Separate check placement theory

CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.

Similar presentations

Presentation on theme: "CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.

Similar presentations

Presentation on theme: "CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky."— Presentation transcript:

Similar presentations

About project

Feedback