Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005.

Similar presentations


Presentation on theme: "Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005."— Presentation transcript:

1 Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005

2 Matlab’s sparse matrix design principles All operations should give the same results for sparse and full matrices (almost all) Sparse matrices are never created automatically, but once created they propagate Performance is important -- but usability, simplicity, completeness, and robustness are more important Storage for a sparse matrix should be O(nonzeros) Time for a sparse operation should be O(flops) (as nearly as possible)

3 Matlab’s sparse matrix design principles All operations should give the same results for sparse and full matrices (almost all) Sparse matrices are never created automatically, but once created they propagate Performance is important -- but usability, simplicity, completeness, and robustness are more important Storage for a sparse matrix should be O(nonzeros) Time for a sparse operation should be O(flops) (as nearly as possible) Star-P dsparse matrices: same principles, some different tradeoffs

4 Sparse matrix operations dsparse layout, same semantics as ddense For now, only row distribution Matrix operators: +, -, max, etc. Matrix indexing and concatenation A (1:3, [4 5 2]) = [ B(:, 7) C ] ; A \ b by direct methods

5 Star-P sparse data structure Full: 2-dimensional array of real or complex numbers (nrows*ncols) memory 31053 0590 41260 3153594126 13212 Sparse: compressed row storage about (2*nzs + nrows) memory

6 P0P0 P1P1 P2P2 PnPn Each processor stores: # of local nonzeros range of local rows nonzeros in CSR form Star-P distributed sparse data structure

7 Sorting in Star-P [V, perm] = sort (V) Useful primitive for some sparse array algorithms, e.g. the sparse constructor Star-P uses a parallel sample sort

8 The sparse( ) constructor A = sparse (I, J, V, nr, nc) Input: ddense vectors I, J, V, and dimensions nr, nc A ( I (k), J (k)) = V (k) for each k Sort triples (i, j, v) by (i, j) Adjust for desired row distribution Locally convert to compressed row indices Sum values with duplicate indices

9 Sparse matrix times dense vector y = A * x The first call to matvec caches a communication schedule for matrix A. Later calls to multiply any vector by A use the cached schedule. Communication and computation overlap.

10 Sparse linear systems x = A \ b Matrix division uses MPI-based direct solvers:  SuperLU_dist: nonsymmetric static pivoting  MUMPS: nonsymmetric multifrontal  PSPASES: Cholesky ppsetoption(’SparseDirectSolver’,’SUPERLU’) Iterative solvers implemented in Star-P Ongoing work on preconditioners

11 Sparse eigenvalue solvers [V, D] = eigs(A); etc. Implemented via MPI-based PARPACK [Lehoucq, Maschhoff, Sorensen, Yang] Work in progress

12 Application: Fluid dynamics Modeling density-driven instabilities in miscible fluids (Goyal, Meiburg) Groundwater modeling, oil recovery, etc. Mixed finite difference & spectral method Large sparse generalized eigenvalue problem function lambda = peigs (A, B, sigma, iter, tol) [m n] = size (A); C = A - sigma * B; y = rand (m, 1); for k = 1:iter q = y./ norm (y); v = B * q; y = C \ v; theta = dot (q, y); res = norm (y - theta*q); if res <= tol break; end; lambda = 1 / theta;

13 Uses power method, i.e. matvec and vector arithmetic Page ranking demo from SC’03 on a web crawl of mit.edu (170,000 pages) Page ranking example

14 Combinatorial Scientific Computing Sparse matrix methods; machine learning; search and information retrieval; adaptive multilevel modeling; geometric meshing; computational biology;... How will combinatorial methods be used by people who don’t understand them in complete detail?

15 Analogy: Matrix division in Matlab x = A \ b; Works for either full or sparse A Is A square? no => use QR to solve least squares problem Is A triangular or permuted triangular? yes => sparse triangular solve Is A symmetric with positive diagonal elements? yes => attempt Cholesky after symmetric minimum degree Otherwise => use LU on A(:, colamd(A))

16 Combinatorics in Star-P A sparse matrix language is a good start on primitives for combinatorial computing.  Random-access indexing: A(i,j)  Neighbor sequencing: find (A(i,:))  Sparse table construction: sparse (I, J, V) Other primitives we are building: sorting, searching, pointer-jumping, connectivity and paths,...

17 Matching and depth-first search in Matlab dmperm: Dulmage-Mendelsohn decomposition Square, full rank A:  [p, q, r] = dmperm(A);  A(p,q) is block upper triangular with nonzero diagonal  also, strongly connected components of a directed graph  also, connected components of an undirected graph Arbitrary A:  [p, q, r, s] = dmperm(A);  maximum-size matching in a bipartite graph  minimum-size vertex cover in a bipartite graph  decomposition into strong Hall blocks

18 Connected components (work in progress) Sequential Matlab uses depth-first search ( dmperm ), which doesn’t parallelize well Shiloach-Vishkin algorithm:  repeat  Link every (super)vertex to a random neighbor  Shrink each tree to a supervertex by pointer jumping  until no further change Hybrid SV / search method under construction Other potential graph kernels:  Breadth-first search (after Husbands, LBNL)  Bipartite matching (after Riedy, UCB)  Strongly connected components (after Pinar, LBNL)

19 MIT Lincoln Laboratory SSCA #2 -- Overview SSCA #2: Directed weighted multigraph with no self-loops Scalable data generator Kernel 1 — Graph Construction Kernel 2 — Classify Large Sets (Search) Kernel 3 — Subgraph Extraction Kernel 4 — Graph Clustering / Partitioning Randomly sized “cliques” Most intra-clique edges present Random inter-clique edges, gradually 'thinning' with distance Integer and character string edge labels Randomized vertex numbers

20 MIT Lincoln Laboratory HPCS Benchmark Spectrum SSCA #2

21 MIT Lincoln Laboratory Scalable Data Generator Produce edge tuples containing the start vertex, end vertex, and weight for each directed edge  Hierarchical nature, with random-sized cliques connected by edges between some of the cliques  Interclique edges are assigned using a random distribution that represents a hierarchical thinning as vertices are further apart  Weights are either random integer values or randomly selected character strings  Random permutations to avoid induced locality  Vertex numbers — local processor memory locality  Parallel — global memory locality Data sets must be developed in their entirety before graph construction May be parallelized — data tuple vertex naming schemes must be globally consistent Data generator will not be timed Statistics will be saved to be used in later verification stages

22 MIT Lincoln Laboratory Kernel 1 — Graph Construction Construct the multigraph in a format usable by all subsequent kernels No subsequent modifications will be permitted to benefit specific kernels Graph construction will be timed

23 MIT Lincoln Laboratory Kernel 2 — Sort on Selected Edge Weights Sort on edge weights to determine those vertex pairs with  The largest integer weights  With a desired string weight The two vertex pair lists will be saved for use in the subsequent kernel  Start set SI for integer start sets  Start set SC for character start sets Sorting weights will be timed

24 MIT Lincoln Laboratory Kernel 3 — Extract Subgraphs Extract a series of subgraphs formed by following paths of length k from a start set of initial vertices The set of initial vertices will be determined from the previous kernel Produce a series of subgraphs consisting of vertices and edges on paths of length k from the vertex pairs start sets SI and SC A possible computational kernel for graph extraction is Breadth First Search Extracting graphs will be timed

25 MIT Lincoln Laboratory Kernel 4 — Partition Graph Using a Clustering Algorithm Apply a graph clustering algorithm to partition the vertices of the graph into subgraphs no larger than a maximum size so as to minimize the number of edges that need be “cut”, e.g.  Kernighan and Lin  Sangiovanni-Vincentelli, Chen, and Chua The output of this kernel will be a description of the clusters and the edges that form the interconnecting “network” This step should identify the structure in the graph specified in the data generation phase It is important that the implementation of this kernel does not utilize a priori knowledge of the details in the data generator or the statistics collected in the graph generation process Identifying the clusters and the interconnecting “network” will be timed

26 Concise SSCA#2 A serial Matlab implementation of SSCA#2 Coding style is simple and “data parallel” Our starting point for Star-P implementation Edge labels are only integers, not strings yet

27 Data generation Using Lincoln Labs serial Matlab generator at present Data-parallel Star-P generator under construction Generate “edge triples” as distributed dense vectors

28 Kernel 1: Graph construction cSSCA#2 K1: ~30 lines of executable serial Matlab Multigraph is a cell array of simple directed graphs Graphs are dsparse matrices, created by sparse( )

29 Kernel 2: Search in large sets cSSCA#2 K2: ~12 lines of executable serial Matlab Uses max( ) and find( ) to locate edges of max weight ng = length(G.edgeWeights); % number of graphs maxwt = 0; for g = 1:ng maxwt = max(maxwt, max(max(G.edgeWeights{g}))); end intedges = zeros(0,2); for g = 1:ng [intstart intend] = find(G.edgeWeights{g} == maxwt); intedges = [intedges ; intstart intend]; end;

30 Kernel 3: Subgraph extraction cSSCA#2 K3: ~25 lines of executable serial Matlab Returns subgraphs consisting of vertices and edges within specified distance of specified starting vertices Sparse matrix-vector product for breadth-first search

31 Kernel 4: Partitioning / clustering cSSCA#2 K4: serial prototype exists, not yet parallel Different algorithm from Lincoln Labs executable spec, which uses seed-growing breadth-first search cSSCA#2 uses spectral methods for clustering, based on eigenvectors (Fiedler vectors) of the graph One version seeks small balanced edge cuts (graph partitioning); another seeks clusters with low isoperimetric number (clustering)


Download ppt "Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005."

Similar presentations


Ads by Google