Download presentation
Presentation is loading. Please wait.
Published byElfreda Burns Modified over 9 years ago
1
April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan
2
April 19, 2010HIPS 20102 Motivation Statically
3
April 19, 2010HIPS 20103 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion
4
April 19, 2010HIPS 20104 Inversion of a Triangular Matrix Formal Linear Algebra Methods Environment (FLAME) High-level abstractions for expressing linear algebra algorithms Triangular Inversion (Trinv) R := U -1
5
April 19, 2010HIPS 20105 Inversion of a Triangular Matrix
6
April 19, 2010HIPS 20106 Inversion of a Triangular Matrix LAPACK-style Implementation DO J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ JB, N-J-JB+1, -ONE, A( J, J ), LDA, $ A( J, J+JB ), LDA ) CALL DGEMM( ‘No transpose’, ‘No transpose’, $ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA, $ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA ) CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ J-1, JB, ONE, A( J, J ), LDA, $ A( 1, J ), LDA ) CALL DTRTI2( ‘Upper’, ‘Non-unit’, $ JB, A( J, J ), LDA, INFO ) ENDDO
7
April 19, 2010HIPS 20107 Inversion of a Triangular Matrix FLASH Matrix of matrices
8
April 19, 2010HIPS 20108 Inversion of a Triangular Matrix FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); }
9
April 19, 2010HIPS 20109 Inversion of a Triangular Matrix Extensible Markup Language (XML) FLA_UPPER_TRIANGULAR BR" inout="both">A A FLA_LEFT FLA_UPPER_TRIANGULAR FLA_NO_TRANSPOSE FLA_NONUNIT_DIAG FLA_MINUS_ONE A FLA_NO_TRANSPOSE FLA_ONE
10
April 19, 2010HIPS 201010 Inversion of a Triangular Matrix Extensible Markup Language (XML) Cont. A FLA_ONE A FLA_RIGHT FLA_UPPER_TRIANGULAR FLA_NO_TRANSPOSE FLA_NONUNIT_DIAG FLA_ONE A FLA_UPPER_TRIANGULAR FLA_NONUNIT_DIAG A
11
April 19, 2010HIPS 201011 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion
12
April 19, 2010HIPS 201012 Requisite Semantic Information Partitioning Scheme FLA_UPPER_TRIANGULAR BR" inout="both">A A
13
April 19, 2010HIPS 201013 Requisite Semantic Information Problem Size* FLA_UPPER_TRIANGULAR BR" inout="both">A A
14
April 19, 2010HIPS 201014 Requisite Semantic Information Updates FLA_UPPER_TRIANGULAR BR" inout="both">A A
15
April 19, 2010HIPS 201015 Requisite Semantic Information Input and Output Parameters alpha A B alpha A B beta C A
16
April 19, 2010HIPS 201016 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion
17
April 19, 2010HIPS 201017 Static Generation of a DAG Code Generation Convert XML representation to FLASH code generation intermediary Annotated with input and output information Create directed acyclic graph (DAG) by statically unrolling the loop Operations on submatrix blocks (tasks) are vertices Data dependencies between tasks are edges
18
April 19, 2010HIPS 201018 Static Generation of a DAG Data Dependencies Flow (read-after-write) S1: A = B + C; S2: D = A + E; Anti (write-after-read) S3: F = A + G; S4: A = H + I; Output (write-after-write) S5: A = J + K; S6: A = L + M;
19
April 19, 2010HIPS 201019 Static Generation of a DAG
20
April 19, 2010HIPS 201020 Static Generation of a DAG Problem Size Problem size cannot be determined a priori Fix the block size or loop unrolling factor Balance between instruction footprint and data granularity of tasks Example Trinv on 3x3 matrix of blocks
21
April 19, 2010HIPS 201021 Static Generation of a DAG Trinv Iteration 1 Trinv 2 Trsm 0 Trsm 1
22
April 19, 2010HIPS 201022 Static Generation of a DAG Trinv Iteration 2 Trsm 5 Gemm 4 Trinv 6 Trsm 3
23
April 19, 2010HIPS 201023 Static Generation of a DAG Trinv Iteration 3 Trsm 7 Trsm 8 Trinv 9
24
April 19, 2010HIPS 201024 Static Generation of a DAG Trsm 1 Trinv 2 Trsm 0 Gemm 4 Trsm 5 Trinv 9 Trsm 3 Trsm 7 Trsm 8 Trinv 6
25
April 19, 2010HIPS 201025 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion
26
April 19, 2010HIPS 201026 Performance LabVIEW Graphical, data flow programming language (G) Anti-dependencies cannot exist in G Copies are made when wire is split
27
April 19, 2010HIPS 201027 Performance
28
April 19, 2010HIPS 201028 Performance Target Architecture 16-core AMD processor 4 socket quad-core Opteron 1.9 GHz 4 GB of RAM per socket LabVIEW 8.6 Windows XP Basic Linear Algebra Subprograms (BLAS) MKL 7.2
29
April 19, 2010HIPS 201029 Performance
30
April 19, 2010HIPS 201030 Performance Results Parallelism Exploit parallelism inherent within DAG Hierarchical matrix storage Spatial locality Overhead Copy matrix from flat row-major storage to hierarchical matrix and back
31
April 19, 2010HIPS 201031 Performance
32
April 19, 2010HIPS 201032 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion
33
April 19, 2010HIPS 201033 Conclusion Instantiate linear algebra algorithm using a code generation intermediary Statically produce a directed acyclic graph by fixing block size or loop unrolling factor XML → FLASH → DAG
34
April 19, 2010HIPS 201034 Acknowledgments Jim Nagle, Robert van de Geijn We thank the other members of FLAME team for their support Funding National Instruments NSF Grants CCF—0540926 CCF—0702714
35
April 19, 2010HIPS 201035 Conclusion More Information http://www.cs.utexas.edu/~flame Questions? echan@cs.utexas.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.