Download presentation
Presentation is loading. Please wait.
Published byPeter Bradley Modified over 9 years ago
1
A Framework for Distributed Tensor Computations Martin Schatz Bryan Marker Robert van de Geijn The University of Texas at Austin Tze Meng Low Carnegie Mellon University Tamara G. Kolda Sandia National Labs: Livermore 1
2
Envisioned workflow 1.New architecture comes out 2.Scientists specify what they want computed on new architecture to (computer) scientists 3.(Computer) scientists provide efficient library for the computation on new architecture 4.Scientists do science 2 Formality is key!
3
Goals Formally describe distribution of tensor data on processing grids Identify patterns in collective communications to utilize specialized implementations when possible Provide systematic approach to creating algorithms and implementations for problems Achieve high performance 3
4
Outline Description of parallel matrix-matrix multiplication Quick overview of tensors and tensor contractions A notation for distributing/redistributing tensors A method for deriving algorithms 4
5
Data Distribution Approach “Cyclically wrap” each mode of the tensor on the grid Assign elements of the tensor to processes based on the assigned indices When restricted to 2-D objects on 2-D grids, ideas correspond to theory of Elemental 1 library 5 1 Martin D. Schatz, Jack Poulson, and Robert van de Geijn. Parallel matrix multiplication: 2d and 3d. FLAME Working Note #62 TR-12-13, The University of Texas at Austin, Department of Computer Sciences, JUNE 2012
6
Assume a computing grid arranged as an order-N object Elements of tensors wrapped elemental-cyclically on the grid Assumptions 6 For this example, we assume an order-2 tensor (matrix) on order-2 grid 0 1
7
Assume a computing grid arranged as an order-N object Elements of tensors wrapped elemental-cyclically on the grid Assumptions 7 For this example, we assume an order-2 tensor (matrix) on order-2 grid 0 1
8
Data distribution notation: The Basics Assign a distribution scheme to each mode of the object 8 How indices of rows (mode 1) are distributed How indices of columns (mode 0) are distributed
9
Data distribution notation: The Basics Assign a distribution scheme to each mode of the object 9 Distributed based on mode 1 of grid Distributed based on mode 0 of grid How indices of columns (mode 0) are distributed How indices of rows (mode 1) are distributed Tuple assigned to each mode is referred to as the “mode distribution”
10
Example 1 Distribute indices of columns based on mode 0 of grid 10
11
Example 1 Distribute indices of columns based on mode 0 of grid 11
12
Distribute indices of columns based on mode 0 of grid Example 1 12
13
Distribute indices of columns based on mode 0 of grid Example 1 13
14
Distribute indices of columns based on mode 0 of grid Example 1 14
15
Distribute indices of columns based on mode 0 of grid Example 1 15
16
Distribute indices of columns based on mode 0 of grid Example 1 16
17
Distribute indices of columns based on mode 0 of grid Distribute indices of rows based on mode 1 of grid Example 1 17
18
Distribute indices of columns based on mode 0 of grid Distribute indices of rows based on mode 1 of grid Example 1 18
19
Distribute indices of columns based on mode 0 of grid Distribute indices of rows based on mode 1 of grid Example 1 19
20
Distribute indices of columns based on mode 0 of grid Distribute indices of rows based on mode 1 of grid Example 1 20
21
Distribute indices of columns based on mode 0 of grid Distribute indices of rows based on mode 1 of grid Example 1 21
22
Distribute indices of columns based on mode 0 of grid Distribute indices of rows based on mode 1 of grid Example 1 22
23
Distributions wrap elements on a logical view of grid – Allows for multiple grid modes to be used in symbols Example, views grid as represents replication Notes 23
24
We use boldface lowercase Roman letters to refer to mode distributions Elements of mode distributions denoted with subscripts Concatenation of mode distributions denoted Notes 24
25
Elemental Notation Distributions of Elemental can be viewed in terms of defined notation 25
26
Parallel Matrix multiplication Heuristic – Avoid communicating the “large” matrix – Leads to “Stationary” A,B,C algorithm variants Stationary C algorithm: 26
27
27
28
28
29
29
30
Outline Description of parallel matrix-matrix multiplication Quick overview of tensors and tensor contractions A notation for distributing/redistributing tensors A method for deriving algorithms 30
31
Tensors and tensor contraction Tensor – An order-m (m-mode) operator Each mode associated with feature of the application – Modes have fixed length (dimension) 31
32
Notation Tensors in capital script Elements of tensors in lowercase Greek Element’s location in tensor as subscripts 32
33
Tensor contractions Einstein notation 1 implicitly sums over modes shared by inputs 33 1 A. Einstein. Die Grundlage der allgemeinen Relativit ̈atstheorie. Annalen der Physik, 354:769–822, 1916
34
Tensor contractions Einstein notation 1 implicitly sums over modes shared by inputs Transpose corresponds to interchange of modes 34 1 A. Einstein. Die Grundlage der allgemeinen Relativit ̈atstheorie. Annalen der Physik, 354:769–822, 1916
35
Tensor contractions Einstein notation 1 implicitly sums over modes shared by inputs Transpose corresponds to interchange of modes Arbitrary number modes involved (any of which can sum) 35 1 A. Einstein. Die Grundlage der allgemeinen Relativit ̈atstheorie. Annalen der Physik, 354:769–822, 1916
36
Tensor contractions Third-order Møller-Plesset 1 method from computational chemistry 36 1 R J Bartlett. Many-body perturbation theory and coupled cluster theory for electron correlation in molecules. Annual Review of Physical Chemistry, 32(1):359–401, 1981
37
37 Through permutation of data, can arrange in such a way that MMmult can be performed Results in algorithm of form Requires large rearrangement of data – Cost of this operation magnified in distributed-memory environments Tensor contraction as MMmult
38
Outline Description of parallel matrix-matrix multiplication Quick overview of tensors and tensor contractions A notation for distributing/redistributing tensors A method for deriving algorithms 38
39
Tensor distribution notation We’ve already seen the notation for order-2 tensors on order- 2 grids What if higher-order tensor? – More modes to assigned distribution symbols to – Ex. order-4 tensor What if higher-order grid? – More grid modes to choose from when creating distribution symbols – Ex. Mode distributions may only contain elements from {0,1,2} if computing on order-3 grid 39
40
Redistributions: Allgather 40 Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience, 19(13):1749–1783, 2007
41
Allgather in action 41
42
Allgather in action 42
43
Allgather in action 43 Before
44
Allgather in action 44 After Before
45
Redistributions: Allgather Allgather within mode performs the following redistribution of data 45
46
Redistribution rules 46 Communication within modes specified by can perform the following redistributions – Ex.
47
Outline Description of parallel matrix-matrix multiplication Quick overview of tensors and tensor contractions A notation for distributing/redistributing tensors A method for deriving algorithms 47
48
Algorithm choices For matrix operations, “Stationary” variants are useful – Extending ideas to tensors also useful? Potentially other “families” of algorithms to choose from – Only focusing on those we know how to encode for now 48
49
Deriving Algorithms: Stationary Avoid communicating Assumed order-4 grid 49
50
Deriving Algorithms: Stationary Avoid communicating Distribute modes similarly during local computation Assumed order-4 grid 50
51
Deriving Algorithms: Stationary Avoid communicating Distribute modes similarly during local computation Do not reuse modes of the grid Assumed order-4 grid 51
52
Deriving Algorithms: Stationary Avoid communicating Distribute modes similarly during local computation Do not reuse modes of the grid Assumed order-4 grid 52
53
Deriving Algorithms: Stationary Assumed order-4 grid Avoid communicating 53
54
Deriving Algorithms: Stationary Assumed order-4 grid 54 Avoid communicating Distribute modes similarly during local computation
55
Deriving Algorithms: Stationary Assumed order-4 grid 55 Avoid communicating Distribute modes similarly during local computation Do not reuse modes of grid
56
Deriving Algorithms: Stationary Avoid communicating Distribute modes similarly during local computation Do not reuse modes of grid Output is does not have duplication (a reasonable choice) Assumed order-4 grid 56
57
Deriving Algorithms: Stationary Avoid communicating Distribute modes similarly during local computation Do not reuse modes of grid Output is does not have duplication (a reasonable choice) Apply rules of reduction redistribution Assumed order-4 grid 57
58
Deriving Algorithms: Stationary Assumed order-4 grid 58
59
Quick Note Blocking described algorithms should be straightforward (done for matrix operations) 59
60
Analyzing algorithms Communication costs used obtained from Ernie Chan, Marcel Heimlich, Avi Purkayastha, and Robert van de Geijn. Collective communication: theory, practice, and experience. Concurrency and Computation: Practice and Experience, 19(13):1749– 1783, 2007. 60
61
Analyzing Stationary algorithm Redistribute – All-to-all modes (2,3) – Allgather modes (1,2) Redistribute – All-to-all modes (0,1) – Allgather modes (3,0) Local tensor contraction 61 grid
62
Analyzing Stationary algorithm 62 Redistribute – All-to-all modes (2,3) – Allgather modes (1,2) Redistribute – All-to-all modes (0,1) – Allgather modes (3,0) Local tensor contraction grid
63
Analyzing Matrix-mapping approach 63 Permute Local tensor contraction Permute
64
Analyzing Matrix-mapping approach 64 Permute Local tensor contraction Permute
65
Picking the “best” algorithm Stationary algorithm Matrix-multiply based algorithm 65 Collectives involved processes
66
How this all fits together Formalized aspects of distributed tensor computation – Rules defining valid data distributions – Rules specifying how collectives affect distributions Given a mechanical way to go from problem specification to an implementation If other knowledge can be formalized, search space reduced 66
67
Acknowledgements Tamara G. Kolda – Sandia National Laboratories: Livermore Robert van de Geijn Bryan Marker Devin Matthews Tze Meng Low The FLAME team 67
68
Thank you This work has been funded by the following – Sandia National Laboratories: Sandia Graduate Fellowship – NSF CCF-1320112: SHF: Small: From Matrix Computations to Tensor Computations – NSF ACI-1148125/1340293 (supplement): Collaborative Research: SI2- SSI: A Linear Algebra Software Infrastructure for Sustained Innovation in Computational Chemistry and other Sciences. – Argonne National Laboratories for access to computing resources 68
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.