Presentation is loading. Please wait.

Presentation is loading. Please wait.

Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.

Similar presentations


Presentation on theme: "Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory."— Presentation transcript:

1 Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2015-7696C miniTri and a Task-Based Linear Algebra Building Blocks Approach for Scalable Graph Analytics Michael Wolf, Jon Berry, Dylan Stark IEEE HPEC 16 September, 2015

2 Miniapps  Small, self contained applications  Proxy for important characteristics of full applications  Important co-design tool: performance analysis  Target 1: existing applications on new and future architectures  Target 2: new applications on existing architectures  Strong partnership with industry  Mantevo MiniApp Suite (Sandia): mantevo.org 2 phdMesh (Mantevo)

3 Graph BLAS  Effort to standardize building blocks for graph algorithms in language of linear algebra  Overloaded linear algebra kernels express most graph computations  Matrix-graph duality – highly impactful in CS&E (partitioning, solvers,…)  Promising for data sciences but many challenges  Graph BLAS BoF – Buluc, 18:00 Eden Vale (C1) 3 CS&E = Computational Science and Engineering Eds. Kepner, Gilbert

4 Task Parallelism 4 Dependency between tasks T4 T2 T5T6T7T8T1T2 T3 T4 T1 T3 T1 T2 T3 T4T5 T6 T4 T7T8 T3 T2T3T4T1 T3 T4T5T6 T7T8T1 T2 Core 0 Core 1Core 2 Core 3 Kernel boundaries  Dataflow of application expressed through tasks/dependencies (Avoid explicit barriers between kernels)  Overdecomposition of problem into tasks (# tasks > #cores)  Tasks scheduled and moved to appropriate compute resources

5 Task Parallelism 5 Dependency between tasks T4 T2 T5T6T7T8T1T2 T3 T4 T1 T3 T1 T2 T3 T4T5 T6 T4 T7T8 T3 T2T3T4T1 T3 T4T5T6 T7T8T1 T2 Core 0 Core 1Core 2 Core 3 Kernel boundaries  Natural for data analytics (model intrinsically data-centric)  Several task parallel models/libraries: HPX, Kokkos/Qthreads, Uintah, Legion, …

6  Background  miniTri  Linear Algebra-Based miniTri (miniTriLA)  Task Parallel Approach to miniTriLA  Summary Outline 6

7 miniTri: Data Analytics Miniapp 7  Triangle-based miniapp  Proxy (Mantevo) for triangle based data analytics (e.g., graph noise- filtering)  miniTri Steps:  For each triangle, calculate triangle degrees for vertices and edges  For each triangle, calculate integer k given triangle degree info 3 3 miniTri poses different challenges than traditional data analytics benchmarks such as Graph 500. miniTri poses different challenges than traditional data analytics benchmarks such as Graph 500. k: Max clique size of given triangle

8 miniTri: Overview 8  Developed 12+ variants  Different methods: buckets data structure, set intersection, linear algebra  Different programming models: OpenMP, MPI, HPX, Kokkos/Qthreads  Focus of talk: linear algebra-based miniTri  Related triangle enumeration work  Azad, Buluç, Gilbert. “Parallel triangle counting and enumeration using matrix algebra”, Proc. of the IPDPSW, GABB Workshop, 2015. 3 3 Challenge: Can miniTri be implemented efficiently using Graph BLAS-like building blocks? Challenge: Can miniTri be implemented efficiently using Graph BLAS-like building blocks? k:

9  Background  miniTri  Linear Algebra-Based miniTri (miniTriLA)  Task Parallel Approach to miniTriLA  Summary Outline 9

10 Linear Algebra Based miniTri (miniTriLA)  Developed Graph BLAS-like formulation of miniTri  Important stressor of Graph BLAS 10 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t v = C * 1 3 t e = C T * 1 4 kcount(C, t v, t e ) Adjacency matrix of G Incidence matrix of G G

11 miniTriLA: Triangle Enumeration 11 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t v = C * 1 3 t e = C T * 1 4 kcount(C, t v, t e )  Wedges implicitly stored in rows of A  Adj matrix: wedges = pairwise combinations of nonzero column ids + row #  E.g., row 5 wedges: {1,5,2}, {1,5,3}, {1,5,4}, {2,5,3}, {2,5,4}, {3,5,4} Enumerates each triangle 3 times (once: C=L*B, where L is lower triangle part of A) SPGEMM = Sparse Matrix Multiplication G Adjacency matrix of G Incidence matrix of G Matrix of Triangles

12 miniTriLA: Triangle Enumeration 12 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t v = C * 1 3 t e = C T * 1 4 kcount(C, t v, t e )  Columns of B tell us whether wedge is “closed” to form triangle  Overloaded SpGEMM yields triangle enumeration:  If A i,x =A i,y = 1 and B x,j =B y,j = 1 (or equivalently A i,* B *,j =2), C i,j = triangle {i,x,y}  Else, no triangle Enumerates each triangle 3 times (once: C=L*B, where L is lower triangle part of A) SPGEMM = Sparse Matrix Multiplication G Adjacency matrix of G Incidence matrix of G Matrix of Triangles

13 miniTriLA: Triangle Degree Calculation 13 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t v = C * 1 3 t e = C T * 1 4 kcount(C, t v, t e )  Triangle degree calculation  Each triangle represented once in C for each of its edges and vertices  Triangle vertex and edge degrees are number of nz in rows and columns Triangles with v 4 Triangles with e 4,5 C= Overloaded SpMV to count triangles in rows

14 miniTriLA: Kcounts 14 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t e = C * 1 3 t v = C T * 1 4 kcount(C, t e, t v ) K123 triangle count 002  Compute k for each triangle (summarize in table)  k-count table gives us upper bound on largest clique in graph  Largest c such that Comb(c,3) triangles have k-counts at least c G k:

15 miniTriLA: Challenges 15 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t e = C * 1 3 t v = C T * 1 4 kcount(C, t e, t v )  Challenge 1: Computation difficult to load-balance  Challenge 2: Typical graph BLAS approach forms C, which means storing all triangles in graph  Worst case: O(|E| 3/2 ) triangles in graph, typical: 100-1000 triangles/edge  Severely limits size of graph

16  Background  miniTri  Linear Algebra-Based miniTri (miniTriLA)  Task Parallel Approach to miniTriLA  Summary Outline 16

17 Task Parallel Approach  Each block of linear algebra operations assigned to a task  Block of rows, 2D block of elements  Global barriers between kernels removed  Replaced by dependencies between tasks  Tasks in subsequent kernels can start/finish before first kernel finishes 17 Illustrative Example of GraphBLAS Approaches Traditional data parallel Task parallel

18 Task Parallel Implementation  Implemented Task Parallel miniTriLA  HPX (LSU), Kokkos/Qthreads (Sandia)  Addressing Challenge #1: Load imbalance  Task parallelism can be used to improve load balance 18 Initial results show slight improvement of task parallel approach over data parallel approach

19 Addressing Challenge #2: Storing all Triangles  Key insight: Task parallelism can be used in resource constrained environments to reduce memory footprint  Prioritize k-count tasks to free blocks of triangles from memory  Need runtime system to support advanced resource management/priorities (ongoing effort: HPX and Kokkos/Qthreads) 19 Prioritize Task parallel approach allows Graph BLAS implementation of miniTri to solve much larger problems

20  Background  miniTri  Linear Algebra-Based miniTri (miniTriLA)  Task Parallel Approach to miniTriLA  Summary Outline 20

21 Summary  Overview of new data analytics miniapp miniTri  Will be released soon as part of Mantevo  Presented linear algebra-based formulation of miniTri  miniTri in 4 compact linear algebra-based operations  Graph Algorithm Building Block (GABB) for triangle enumeration  GABBs for calculating triangle vertex and edge degree  miniTri poses challenges for Graph BLAS-like implementations  Load balancing, Memory usage  Presented task parallel approach that addresses these challenges  Asynchrony is key  Can use task priorities to constrain memory usage 21

22 Additional Info/Resources  miniTri will be released soon under Mantevo  Waiting for copyright approval  mantevo.org has several other miniapps  Graph BLAS  http://istc-bigdata.org/GraphBlas/  HPX  LSU – http://stellar.cct.lsu.edu/projects/hpx/  HPX-5 (IU) – https://hpx.crest.iu.edu/  Kokkos  https://github.com/kokkos  Qthreads  https://github.com/Qthreads/qthreads 22

23 Extra 23

24 Traditional Data Parallelism 24 Data distributed across cores Global barriers between kernels Data distributed across cores Global barriers between kernels C1 Core 0 Core 1Core 2 Core 3 Kernel boundaries C2 C3C4 C1C2 C3C4 C1C2 C3 C4 C1 C2 C3 C4 C1C2 C3C4

25 kcount 25 Upper bound on largest clique in graph = largest c such that Comb(c,3) triangles have k-counts at least c Any v of k-clique, incident on Comb(k-1,2) triangles of that clique Any e of k-clique, incident on k-2 triangles of that clique argmax selects the largest k satisfying these condition (largest clique containing triangle) 3 3 k:

26 miniTriLA: Challenges 26 A = adjacency matrix for graph, B = incidence matrix for graph, 1 = vector of ones miniTri 1 C = A * B 2 t e = C * 1 3 t v = C T * 1 4 kcount(C, t e, t v )  Challenge 1: Computation difficult to load-balance  Challenge 2: Graph BLAS approach forms C, which means storing all triangles in graph  Worst case: O(|E| 3/2 ) triangles in graph, typical: 100-1000 triangles/edge  Severely limits size of graph


Download ppt "Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory."

Similar presentations


Ads by Google