Download presentation
Presentation is loading. Please wait.
Published byMervyn Lawson Modified over 8 years ago
1
CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky
2
CS717 Problem at Hand Have matrices A and B Want to compute their product: AB Ask a matrix-matrix-multiply (MMM) implementation to compute product Answer: C Question: Is C the correct answer? How could we know for sure?
3
CS717 Algorithm-Based Fault Tolerance Encode input matrices via error-correcting code Run regular MMM algorithm on encoded matrices –Encoding invariant under MMM Naturally outputs encoded matrices Encoding guarantees: –If upto t errors in output, will detect error –If upto c<t errors in output, can decode correct output matrix
4
CS717 Outline Linear Error Correcting Codes Algorithm-Based Fault Tolerance ABFT = Linear Encoding of Matrices
5
CS717 Error Correcting Codes Map f: k n –k-long data words n-long codewords –We use ={0, 1} Code of length n is a “sparse” subset of n –Very few possible words are valid codewords Rate of code Amount of information communicated by each codeword
6
CS717 Minimum Distance Minimum Distance: d() = Hamming distance Hamming distance: number of spots where words differ Measures difficulty of decoding/correcting corrupted codewords
7
CS717 Detection and Correction Code may detect errors in d min spots –No error can morph one codeword into another May correct errors in (d min -1)/2 spots –Can still find “closest” codeword More details later… Each codeword defines circle around itself of radius d min /2
8
CS717 Linear Codes Codewords form linear subspace inside n In rowspace of generator matrix G: a (n=7, k=3) code
9
CS717 Property 1 Linear combination of any codewords is also a codeword: For any x,y C, (x+y) C Codeword*constant is codeword For any z C, k*z C always a codeword Proof: basic properties of linear spaces
10
CS717 Property 2 Minimum distance of linear code = Where Proof:
11
CS717 Parity Check Matrix H: dual matrix to G –Contains basis of space orthogonal to G’s row space –n-k dimentional space H is (n-k)xn Space defined as: Note: H also defines a linear code
12
CS717 Property 3 d min =min # of columns of H that can sum to 0 Proof:
13
CS717 Property 4 Minimum distance of linear code n-k+1 Proof –Total n dimensions (since codewords are n-vectors) –G’s rowspace rank = k –Thus, H’s columspace rank = n-k –Thus, n-k+1 columns will be linearly dependent Add up to 0 –By Property 3, this is d min
14
CS717 Outline Linear Error Correcting Codes Algorithm-Based Fault Tolerance ABFT = Linear Encoding of Matrices
15
CS717 Encoding a Matrix Algorithm-Based Fault Tolerance introduced by Huang and Abraham in 1984 Encode each row of matrix via extra column Column entries = sums of matrix rows
16
CS717 Encoding a Matrix Encode each column of matrix via extra row Row entries = sums of matrix columns Full Encoding:
17
CS717 Detecting Errors Suppose matrix A is corrupted to matrix  –entry â i,j is wrong Can detect error’s exact position:
18
CS717 Correcting Errors Can correct error using row or col checksum
19
CS717 Big Trick: Preservation of Encoding Column-encoded mtx * Row-encoded mtx = = Fully-encoded mtx Can check MMM computation by checking encoding of output If product matrix has an erroneous entry –Can detect –Can correct
20
CS717 Applications Matrix Multiplication –Given encoded A and B, –Check whether MMM result C (?=AB) has valid encoding Matrix Factorization –Given a factorization A=WZ –Verify correctness by verifying encodings of factors Factors row- OR column-encoded Can only detect, not correct errors
21
CS717 Weighted ABFT Oftentimes need to check row- or column- encoded matrices –Ex: factorization, data integrity check Can only detect errors in such matrices Can we also correct? Yes, by generalizing to weighted checking rows/columns
22
CS717 Weighting Suppose we have d n-vectors w 1 …w d Can column-encode matrix A: Lets try out:
23
CS717 Weighted Error Detection
24
CS717 Weighted Error Correction Weighted encoding Detects and Corrects single errors –Even for non full-encoding
25
CS717 Outline Linear Error Correcting Codes Algorithm-Based Fault Tolerance ABFT = Linear Encoding of Matrices
26
CS717 “Surprise” But this is all just a linear code! Generator matrix for above scheme:
27
CS717 Generating Encodings Given m= as message word (or matrix row/column)
28
CS717 Surprise?? Not too surprising really Why else would MMM preserve encoding? Another possibility: –Efficient: can be implemented via bit shifts Room open for using any linear code!
29
CS717 Error Detection/Correction in General To show for linear codes: –Can detect d min errors –Can correct (d min -1)/2 errors Let be original codeword Let be the corrupted codeword –e: error vector
30
CS717 Error Detection in General –s called the “syndrome vector” –Independent of original codeword Note: weight(e) <d min since <d min errors Thus: Detection: if, then ERROR
31
CS717 Error Correction in General Clearly e is correction vector – corrects error in Sufficient to prove: weight(e) (d min -1)/2 H is isomorphism: correction vectors syndrome vectors –i.e. for each correction vector (want to know) unique syndrome vector Thus, possible to correct any error –may not be efficient
32
CS717 H is Onto weight(e) (d min -1)/2 < d min rank(H) = n-k (d min -1)/2 Thus, rank(H) weight(e) and He 0 –Not enough 1’s in e to sum H’s columns to 0 H maps onto its range Thus,
33
CS717 H is 1-1 Let e 1 and e 2 be correction vectors, e 1 e 2 Suppose that: –weight(e 1 &e 2 ) (d min -1)/2 –He 1 = He 2 = s He 1 -He 2 = H(e 1 -e 2 ) = s-s = 0 And so, (e 1 -e 2 ) is a codeword Thus, weight(e 1 -e 2 ) d min But weight(e 1 &e 2 ) (d min -1)/2 and so weight(e 1 -e 2 ) d min -1 Contradiction! e 1 = e 2
34
CS717 Other Encoding Schemes Linear codes preserved by matrix multiplication Presumably, fancier codes might be preserved by fancier computations Limit: –S. Winograd showed in 1962 that any code s.t. f(x y) = f(x) f(y) has rate (k/n) or minimum weight 0 as k How general can we get? Do good solutions exist for small k? –k=64 bits should be good enough
35
CS717 Summary For Matrix Multiplication can encode input via linear codes Solutions exist for more complex codes –Ex: Fourier Transforms On parallel systems must ensure: –No processor touches >1 element per row/column –Else, if one processor fails, encoding overwhelmed with errors –To ensure this must modify algorithm Separate check placement theory
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.