Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matrix and Graph • Matrix • Binary Matrix • Sparse Matrix

Similar presentations


Presentation on theme: "Matrix and Graph • Matrix • Binary Matrix • Sparse Matrix"— Presentation transcript:

1 Matrix and Graph • Matrix • Binary Matrix • Sparse Matrix
• Operations for Vectors/Matrices • Graph and Adjacent Matrix • Adjacent List

2 Matrix and Graph • Matrix is a 2-dimensional structure
• Used in wide areas from physical simulations to customer management • Graphs are also used in many areas, to represent the relations and flows between data • Some data structures have been considered to handle matrix and graph; update, preserve, search, and operate

3 2-Dimensional Structure of Matrix
• An n×m matrix has n×m numbers   can be stored in an array of size n×m  [i,j] element corresponds to the i*m+j th cell of the array • A naïve design is done, but there are something more

4 2-Diemnsaional Array • There is a way to make 2-dimensional array, instead of usual 1-dimensional array • Prepare an array of pointers of size n • Prepare n arrays of size m, and write the place of the first cell of i-th array to the i-th cell of the pointer array • [i,j] element of matrix a is accessed by a[i][j] (in C) Simple structure O(nm) memory space

5 Allocate a 2-Dimensional Array
int *MATRIX_alloc ( int n, int m ){ int i, **a, flag =0; a = malloc ( sizeof(int *)*n ); if ( a=NULL ) return (NULL); for ( i=0 ; i<n ; i++ ){ a[i] = malloc (sizeof(int)*m); if ( a[i] = NULL ) flag = 1; } if ( flag == 1 ) return (NULL); else return (a); int *MATRIX_free ( int **a ){ int i; for ( i=0 ; i<n ; i++ ) free ( a[i] );

6 Binary Matrix • A binary matrix is a matrix all whose cells are either 0 or 1  + each cell is either ○ or ×  + adjacency matrix of a graph, shown later • Space consuming if use one integer for one 01 value (1 bit)   motivated to compress the matrix 01010 10001 11110 11000

7 Representation by Bits
• A row composed of 01 values can be considered as a big integer  by chopping into some integers of 32 bits (or 64 bits), the integer becomes tractable  └m/32┘ integers are sufficient to store a row    (space efficiency also increases, and also cache efficiency) • [i,j] element can be accessed by looking at the (j%32)th bit of the j/32 th integer in the i-th row

8 Handling Bit Access • [i,j] element can be accessed by looking at the (j%32)th bit of the j/32 th integer in the i-th row  … writing a code is bothering • Prepare an array BIT_MASK[]= {1,2,4,8,16,…} BIT_MASK_[]= {0xfffffffe, 0xfffffffd, 0xfffffffb, …} + read value: a[i][j/32] & BITMASK[j%32] + set to 1 : a[i][j/32] = a[i][j/32] | BITMASK[j%32] + set to 0 : a[i][j/32] = a[i][j/32] & BITMASK_[j%32]

9 Sparse Matrix • That’s all, for structures for simple matrices
• Space efficiency is in some sense optimal • But, in application, it is often not sufficient/efficient   for example, if matrix is sparse, many parts are redundant • Sparse matrix has the same value in many cells (usually 0) • Sparse matrix should be stored by memorizing the places with non-zero values

10 Storing Sparse Matrix • Let’s begin from binary matrix, for simplicity
  almost cells are 0, and few 1’s • A simple idea is to make a list of the places of the cells being 1 • That is, memorize (x1,y1),(x2,y2),(x3,y3),… , store the row ID and column ID of the cells being 1 • The memory requirement is “twice the number of 1’s” this is very efficient if there are few 1’s (sparse) • But, bad accessibility; to read a cell, we have to scan all (binary tree / hash can be used)

11 Store Row-wise • Let’s have a structure to improve the accessibility
• Classify the places of 1’s according to their row ID   prepare n arrays, and store the column ID of 1’s in i-th row, in the ith array • We need to have n pointers to n arrays, but we don’t have to store the row ID’s, thus memory efficiency increases • The memory requirement is “# 1’s + #rows×2” (can be “# 1’s + #rows”) • Accessibility is good; sorting ID’s in a row array, binary search works (linear scan is enough, if few column ID’s)

12 Structure in Each Row • In sparse cases, the efficiency is increased,
However, the update concerned with insertion/deletions is not efficient • They are the same, in the situation of stacks and queues • So, according to the purpose, we use lists bucket/hash/binary tree for structures in a row  (having n arrays is equivalent to having buckets)

13 Real World Data • The characteristics of sparse matrices in practice are; + Matrix representing mesh network (structural calculation) few meshes are adjacent to one, in geometrical sense, thus not so many non-zeros per row  array is sufficient for structures of rows + Road network data (adjacency of cross points + distance)   almost the same, but update comes sometimes (would be sufficient if (re-)allocate bit larger memory)

14 Real World Data (2) Ex) A matrix representing, row  text, column  word, a cell is one if the word is included in the text, is sparse, usually  (POS data, Web links, Web surfing, etc.) + on average, #1’s in a row/column is constant, but some have so many (texts having many words, words included in many texts) + distribution of 1’s is that so called power (zip) law, scale free; #of items of size D is proportional to 1 / ΔD can be often seen in real world data (≠ geometric distribution) + Such data needs algorithms designed so that the dense part will not affect badly; will be the bottle neck of the computation

15 Non-binary Sparse Matrix
• Usual matrix are of course non-binary, it is not sufficient to remember the places having non-zero value  remember (place, value) • In the case of using array, (place, value), (place, value), (place, value),…, or place1, plcae2,…, value1, value2,… • In the case of lists of binary tree, assign (place, value) to each cell/node or, simple prepare two of them

16 Exercise • Make data representing the following matrix in a sparse way
0,0,1,4,0 0,1,0,0,5 2,0,0,0,0 1,2,5,0,2 0,0,0,0,0

17 Column: Memory Saving for Matrix
• Buckets, or a row of a sparse matrix needs two data (pointer to the first cell, and the size ki) • We decrease these from two to one • First, prepare an array of size equal to # non-zero cells. Then, + 0th row uses the cells of the array ranging from 0 to k0-1 + 1st uses from k0 to k0+k1-1 … + i-th row uses from k0+…+ki-1 to k0+…+ki-1, and we remember only the start positions of the rows • The size of i-th row can be obtained by (start position of i+1) - (start position of i)

18 Matrix Operation • Basic matrix operations are addition and multiplication (inner product of vectors is a special case) Further, AND and OR for binary matrix • Algorithms for the operations are trivial if the matrices are in the form of 2-dimensional array However, not clear if they are in sparse forms • Further, there are several structures that have advances for matrix operations

19 Addition of Matrix • For the addition, it is sufficient to have algorithms for additions of each row (so, operations of vectors are sufficient) • First, we see the case of inner product of sparse vectors

20 Inner Product • For computing inner product of two sparse vectors, the difficulty is that we have to find the cell corresponding to each • Sort the cells in each vector according to their column ID • Scan two vectors simultaneously, from smaller indices  “simultaneously” means that iteratively pick up the smallest column ID among the two vectors • When we find a column ID at which both vector have non-zero values, accumulate the product of the cells 1 5 5 1 7 3 1 1 3 3 5 4

21 A Code for Sparse Inner Product
int SVECTOR_innerpro (int *va, int ta, int *vb, int tb){ int ia=0, ib=0, c=0; while ( ia<ta && ib<tb){ if (va[ia*2] < vb[ib*2] ) ia++; else if (va[ia*2] > vb[ib*2] ) ib++; else { c = c + va[ia*2+1]*vb[ib*2+1]; ia++; ib++; } return ( c ); 1 5 5 1 7 3 2 1 3 3 5 4

22 Addition of Two Vectors
• The addition can be done in a similar way • Sort the cells in each vector according to their column ID • Scan two vectors simultaneously, from smaller indices • The positions of non-zero values in the resulted vectors are those having non-zero values in one of two vectors, thus can be easily identified by the scan 1 5 5 1 7 3 2 1 3 3 5 4

23 A Code for Addition int SVECTOR_add (int *vc, int *va, int ta, int *vb, int tb){ int ia=0, ib=0, ic=0, c, cc; while ( ia<ta || ib<tb){ if (ia == ta ){ c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if ( ib == tb ){c = va[ia*2+1]; cc = va[ia*2]; ia++; } else if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; } else { c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2]; ia++; ib++; } vc[ic*2] = cc; vc[ic*2+1] = c; ic++; return ( ic ); 1 5 5 1 7 3 2 1 3 3 5 4

24 Column: Endmarks do a Good Job!
• Compared to inner product, code for addition is relatively long  we have exceptions at the end of the array • So, we are motivated to simplify the code by using “endmark” (endmark is a symbol that represent the end of the array, or something else representing the end) • 0, -1 or a very large value is used as an endmark • We prepare an additional cell next to the end of each array, and put an endmark at the cell

25 Column: Endmarks do a Good Job! (2)
int SVECTOR_innerpro (int *vc, int *va, int ta, int *vb, int tb){ int ia=0, ib=0, ic=0, c, cc; while ( va[ia*2] != ENDMARK && vb[ib*2] != ENDMARK){ if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; } else { c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2]; ia++; ib++; } vc[ic*2] = cc; vc[ic*2+1] = c; ic++; vc[ic*2] = ENDMARK; return ( ic ); 1 5 5 1 7 3 2 1 3 3 5 4

26 Matrix Multiplication
• For sparse matrix multiplication, compute the inner products of all the pairs of a row and a column • However, a sparse matrix has row representations but not column representations, getting column vectors is hard • A simple solution is to use transposing algorithm that is explained in the section of bucket; we will have column representation • On the other hand, some data structures are designed to be enabled to trace also columns

27 Four-Direction List • Lists are good at storing sparse vectors, for tracing • However, collection of lists isn’t good at tracing column vectors, because the cells are not connected vertically • …so, let’s have a list connected in both row direction and column direction • Each cell has four arms, that point the neighboring cells in directions of (←, →, ↑, ↓) 4 2 7

28 Pointing the Neighbors
• Links to four directions seems to form a mesh network, but not • …since, the links can cross • In the other words, this structure can be seen as a superimpose of two kinds of lists; horizontal direction and vertical direction, and the identical cells are unified into one 4 4 4 7 2 4

29 Having Lists of 2-Directions
• If we have lists of row vectors and column vectors both, we can have the same accessibility, but insertions/deletions are not same • For example, when we want to delete a cell in a row vector, we would take long time to find the corresponding cell in column lists In four-direction lists, they are already unified 4 4 7

30 very popular structure
Graph Structure • A graph is a structure composed of a set of vertices and a set of edges (an edge is a pair of vertices) • Formed by sets, so the information such as positions, shapes, and crossing edges do not matter, when it is drawn as a picture (a graph with shape/position information is called “graph visualization” or “embedded graph”) • When edges have directions (from one vertex to another), it is called directed very popular structure

31 Examples of Graph Data • Adjacency relation
Hierarchy in an organization Similarity relation • Web network, human network, SNS friend network,…

32 Graph Terminology • Edge e is said to be incident to u, v, and vice versa, if e = (u,v) also u and v are said to be adjacent • The #edges incident to v is the degree of v • A graph having edges for any two vertices is a complete graph • When there are two or more edges connecting two vertices, the edges are called multiple edges • If there is a partition of vertices so that any edge connects a vertex in a group and one in the other, the graph is called bipartite graph

33 Storing a Graph • n vertices can be seen as numbers 0,…,n-1
• Then, an edge is a pair of numbers  can be stored by writing the pairs in array, lists, etc. • Further, we need something for the accessibility for example, we often visit a vertex, and go to the neighboring vertex, and so we need to scan all edges incident to the vertex

34 Using Matrix • The set of edges can be represented by a matrix as follows ① j-th row/column corresponds to vertex j, and ij-cell is 1 if there is edge (i, j) (called adjacency matrix)  + efficient for dense graph having many edges + multiplicity of edges can be represented by the value of a cell ② j-th row corresponds to vertex j, and each column corresponds to an edge; when edge e is incident to vertex i, ij cell is 1 (called incidence matrix) + multiple edges represented easily • Sparse matrix representation has advantage for incidence matrix and sparse graph

35 In Practice • 2-dimensional array is sufficient when the matrix size is small the cost is small, redundancy is small • Sparse matrix such as 100 by 100 with 10 non-zero elements in a row, sparse representation will be efficient (approximately, when density is less than 10%) + When we often want to scan non-zero elements, such as tracing all vertices adjacent to a vertex, sparse representation is useful + If we want to check whether there is an edge between two specified vertices, 2-dimensional array has advantage

36 Incidence Matrix • An incidence matrix represents the incidence relation between vertices and edges • Put indices from 0,…,n-1 to vertices, and 0,…,m-1 to edges + store edges incident to a vertex to the corresponding row = storing vertices incident to an edge in the corresponding column 0: 0,1 1: 1,5 2: 0,3 3: 1,4 4: 1,2 5: 4,5 6: 2,5 7: 2,3 8: 2,4 0: 1,3 1: 0,2,4,5 2: 1,3,4,5 3: 0,2 4: 1,2,5 5: 1,2,4 0: 0,2 1: 0,1,3,4 2: 4,6,7,8 3: 2,7 4: 3,5,8 5: 1,5,6 1 5 1 2 + 6 3 4 5 4 2 8 7 3

37 Advantage of Incidence Matrix
• In the case of incidence matrix, each edge has ID  so, easy to handle the attached information to each edge  just allocate an array of size m, and it is sufficient • In the case of adjacency matrix, edge doesn’t have ID, thus not easy to manage correspondence of edge and its data • Multiple edges are also easy to handle 0: 0,1 1: 1,5 2: 0,3 3: 1,4 4: 1,2 5: 4,5 6: 2,5 7: 2,3 8: 2,4 0: 1,3 1: 0,2,4,5 2: 1,3,4,5 3: 0,2 4: 1,2,5 5: 1,2,4 0: 0,2 1: 0,1,3,4 2: 4,6,7,8 3: 2,7 4: 3,5,8 5: 1,5,6 1 5 1 2 + 6 3 4 5 4 2 8 7 3

38 Allocate Memory for Cells
• Incidence matrix can be realized by cells of lists having four links like sparse matrix (two for vertices of the edges, and two for the edges in the vertex)  disadvantages of arrays are eliminated • Also can be of two array lists • or, prepare an array and edge i corresponds to cells 2i and 2i+1, to represent four links 0: 0,1 1: 1,5 2: 0,3 3: 1,4 4: 1,2 5: 4,5 6: 2,5 7: 2,3 8: 2,4 0: 1,3 1: 0,2,4,5 2: 1,3,4,5 3: 0,2 4: 1,2,5 5: 1,2,4 0: 0,2 1: 0,1,3,4 2: 4,6,7,8 3: 2,7 4: 3,5,8 5: 1,5,6 1 5 1 2 + 6 3 4 5 4 2 8 7 3

39 Exercise • Make an adjacency matrix of the following graph, and that in A sparse incidence matrix 6 5 1 4 2 3

40 Bipartite Graph • A bipartite graph is often seen as a representation of a (binary) (sparse) matrix  associate nodes of one group to rows, and the others to columns connect by edges between vertices corresponding a cell with non-zero value • A representation of different style 4 0: 4,6 1: 4,5 2: 5,6 3: 5,6 1 5 2 6 3

41 Column: Store Huge Graph
• A graph needs two pointer (or integer) per edge weight, and etc. need more • 64 bits are required in 32 bit CPU • However, Web graphs have billion of vertices, and 20 billions of edges  160GB is necessary in this way • This is too much. Can we reduce the storage size?

42 Column: Store Huge Graph (2)
① Only few edges have large degrees Vertices are mainly adjacent to these few vertices  Put indices so that large degree vertices have small indices, and represent small indices by small number of bits, and large indices by many bits Ex.) • If the bit sequence representing a number begins with “0”, the following 7 bits represent [0-127] • If “10”, the following 14 bits represent 128+[ ] • If “11”, the following 30 bits represent ,…

43 Column: Store Huge Graph (3)
② Sort the sites in dictionary order of their URLs  links are usually to near, thus difference of ID’s becomes small • They can be recorded in the same way, to reduce the space • Using these, one edge needs just 10 bits Further, we can reduce it to 5 bits  The storage will be 20GB, thus can fit recent computers

44 Summary • Data structures for matrix
• Structures for sparse matrix, and four directed lists • Structures for graphs: adjacency matrix and incidence matrix adjacency list


Download ppt "Matrix and Graph • Matrix • Binary Matrix • Sparse Matrix"

Similar presentations


Ads by Google