Matrix and Graph • Matrix • Binary Matrix • Sparse Matrix

Slides:



Advertisements
Similar presentations
Pointers.
Advertisements

CHP-5 LinkedList.
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Elementary Data Structures: Part 2: Strings, 2D Arrays, Graphs
Review Binary Search Trees Operations on Binary Search Tree
Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Advanced Data Structures
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Graph & BFS.
Graphs and Matrix Storage Structures EEE 574 Dr. Dan Tylavsky.
Tirgul 9 Amortized analysis Graph representation.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
Connected Components, Directed Graphs, Topological Sort COMP171.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Introduction to Graphs
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Graphs. Graphs Many interesting situations can be modeled by a graph. Many interesting situations can be modeled by a graph. Ex. Mass transportation system,
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Important Problem Types and Fundamental Data Structures
GRAPH Learning Outcomes Students should be able to:
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
9/17/20151 Chapter 12 - Heaps. 9/17/20152 Introduction ► Heaps are largely about priority queues. ► They are an alternative data structure to implementing.
Data Structures Week 5 Further Data Structures The story so far  We understand the notion of an abstract data type.  Saw some fundamental operations.
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Minimum Spanning Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Combinatorial Algorithms Reference Text: Kreher and Stinson.
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
 DATA STRUCTURE DATA STRUCTURE  DATA STRUCTURE OPERATIONS DATA STRUCTURE OPERATIONS  BIG-O NOTATION BIG-O NOTATION  TYPES OF DATA STRUCTURE TYPES.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
A first look an ADTs Solving a problem involves processing data, and an important part of the solution is the careful organization of the data In order.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
Data Structures & Algorithms Graphs
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Data Structures CSCI 132, Spring 2014 Lecture 38 Graphs
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
1 Directed Graphs Chapter 8. 2 Objectives You will be able to: Say what a directed graph is. Describe two ways to represent a directed graph: Adjacency.
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees.
Chapter Lists Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Data Structure and Algorithms
1 GRAPH Learning Outcomes Students should be able to: Explain basic terminology of a graph Identify Euler and Hamiltonian cycle Represent graphs using.
Non Linear Data Structure
Graphs Lecture 19 CS2110 – Spring 2013.
Greedy Algorithm for Community Detection
Graphs Representation, BFS, DFS
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Graphs Chapter 11 Objectives Upon completion you will be able to:
Chapter 22: Elementary Graph Algorithms I
Lectures on Graph Algorithms: searching, testing and sorting
ITEC 2620M Introduction to Data Structures
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
GRAPHS Lecture 17 CS2110 Spring 2018.
Important Problem Types and Fundamental Data Structures
OPIM 915 Fall 2010 Data Structures 23-38,
Invitation to Computer Science 5th Edition
Presentation transcript:

Matrix and Graph • Matrix • Binary Matrix • Sparse Matrix • Operations for Vectors/Matrices • Graph and Adjacent Matrix • Adjacent List

Matrix and Graph • Matrix is a 2-dimensional structure • Used in wide areas from physical simulations to customer management • Graphs are also used in many areas, to represent the relations and flows between data • Some data structures have been considered to handle matrix and graph; update, preserve, search, and operate

2-Dimensional Structure of Matrix • An n×m matrix has n×m numbers   can be stored in an array of size n×m  [i,j] element corresponds to the i*m+j th cell of the array • A naïve design is done, but there are something more

2-Diemnsaional Array • There is a way to make 2-dimensional array, instead of usual 1-dimensional array • Prepare an array of pointers of size n • Prepare n arrays of size m, and write the place of the first cell of i-th array to the i-th cell of the pointer array • [i,j] element of matrix a is accessed by a[i][j] (in C) Simple structure O(nm) memory space

Allocate a 2-Dimensional Array int *MATRIX_alloc ( int n, int m ){ int i, **a, flag =0; a = malloc ( sizeof(int *)*n ); if ( a=NULL ) return (NULL); for ( i=0 ; i<n ; i++ ){ a[i] = malloc (sizeof(int)*m); if ( a[i] = NULL ) flag = 1; } if ( flag == 1 ) return (NULL); else return (a); int *MATRIX_free ( int **a ){ int i; for ( i=0 ; i<n ; i++ ) free ( a[i] );

Binary Matrix • A binary matrix is a matrix all whose cells are either 0 or 1  + each cell is either ○ or ×  + adjacency matrix of a graph, shown later • Space consuming if use one integer for one 01 value (1 bit)   motivated to compress the matrix 01010 10001 11110 11000

Representation by Bits • A row composed of 01 values can be considered as a big integer  by chopping into some integers of 32 bits (or 64 bits), the integer becomes tractable  └m/32┘ integers are sufficient to store a row    (space efficiency also increases, and also cache efficiency) • [i,j] element can be accessed by looking at the (j%32)th bit of the j/32 th integer in the i-th row

Handling Bit Access • [i,j] element can be accessed by looking at the (j%32)th bit of the j/32 th integer in the i-th row  … writing a code is bothering • Prepare an array BIT_MASK[]= {1,2,4,8,16,…} BIT_MASK_[]= {0xfffffffe, 0xfffffffd, 0xfffffffb, …} + read value: a[i][j/32] & BITMASK[j%32] + set to 1 : a[i][j/32] = a[i][j/32] | BITMASK[j%32] + set to 0 : a[i][j/32] = a[i][j/32] & BITMASK_[j%32]

Sparse Matrix • That’s all, for structures for simple matrices • Space efficiency is in some sense optimal • But, in application, it is often not sufficient/efficient   for example, if matrix is sparse, many parts are redundant • Sparse matrix has the same value in many cells (usually 0) • Sparse matrix should be stored by memorizing the places with non-zero values

Storing Sparse Matrix • Let’s begin from binary matrix, for simplicity   almost cells are 0, and few 1’s • A simple idea is to make a list of the places of the cells being 1 • That is, memorize (x1,y1),(x2,y2),(x3,y3),… , store the row ID and column ID of the cells being 1 • The memory requirement is “twice the number of 1’s” this is very efficient if there are few 1’s (sparse) • But, bad accessibility; to read a cell, we have to scan all (binary tree / hash can be used)

Store Row-wise • Let’s have a structure to improve the accessibility • Classify the places of 1’s according to their row ID   prepare n arrays, and store the column ID of 1’s in i-th row, in the ith array • We need to have n pointers to n arrays, but we don’t have to store the row ID’s, thus memory efficiency increases • The memory requirement is “# 1’s + #rows×2” (can be “# 1’s + #rows”) • Accessibility is good; sorting ID’s in a row array, binary search works (linear scan is enough, if few column ID’s)

Structure in Each Row • In sparse cases, the efficiency is increased, However, the update concerned with insertion/deletions is not efficient • They are the same, in the situation of stacks and queues • So, according to the purpose, we use lists bucket/hash/binary tree for structures in a row  (having n arrays is equivalent to having buckets)

Real World Data • The characteristics of sparse matrices in practice are; + Matrix representing mesh network (structural calculation) few meshes are adjacent to one, in geometrical sense, thus not so many non-zeros per row  array is sufficient for structures of rows + Road network data (adjacency of cross points + distance)   almost the same, but update comes sometimes (would be sufficient if (re-)allocate bit larger memory)

Real World Data (2) Ex) A matrix representing, row  text, column  word, a cell is one if the word is included in the text, is sparse, usually  (POS data, Web links, Web surfing, etc.) + on average, #1’s in a row/column is constant, but some have so many (texts having many words, words included in many texts) + distribution of 1’s is that so called power (zip) law, scale free; #of items of size D is proportional to 1 / ΔD can be often seen in real world data (≠ geometric distribution) + Such data needs algorithms designed so that the dense part will not affect badly; will be the bottle neck of the computation

Non-binary Sparse Matrix • Usual matrix are of course non-binary, it is not sufficient to remember the places having non-zero value  remember (place, value) • In the case of using array, (place, value), (place, value), (place, value),…, or place1, plcae2,…, value1, value2,… • In the case of lists of binary tree, assign (place, value) to each cell/node or, simple prepare two of them

Exercise • Make data representing the following matrix in a sparse way 0,0,1,4,0 0,1,0,0,5 2,0,0,0,0 1,2,5,0,2 0,0,0,0,0

Column: Memory Saving for Matrix • Buckets, or a row of a sparse matrix needs two data (pointer to the first cell, and the size ki) • We decrease these from two to one • First, prepare an array of size equal to # non-zero cells. Then, + 0th row uses the cells of the array ranging from 0 to k0-1 + 1st uses from k0 to k0+k1-1 … + i-th row uses from k0+…+ki-1 to k0+…+ki-1, and we remember only the start positions of the rows • The size of i-th row can be obtained by (start position of i+1) - (start position of i)

Matrix Operation • Basic matrix operations are addition and multiplication (inner product of vectors is a special case) Further, AND and OR for binary matrix • Algorithms for the operations are trivial if the matrices are in the form of 2-dimensional array However, not clear if they are in sparse forms • Further, there are several structures that have advances for matrix operations

Addition of Matrix • For the addition, it is sufficient to have algorithms for additions of each row (so, operations of vectors are sufficient) • First, we see the case of inner product of sparse vectors

Inner Product • For computing inner product of two sparse vectors, the difficulty is that we have to find the cell corresponding to each • Sort the cells in each vector according to their column ID • Scan two vectors simultaneously, from smaller indices  “simultaneously” means that iteratively pick up the smallest column ID among the two vectors • When we find a column ID at which both vector have non-zero values, accumulate the product of the cells 1 5 5 1 7 3 1 1 3 3 5 4

A Code for Sparse Inner Product int SVECTOR_innerpro (int *va, int ta, int *vb, int tb){ int ia=0, ib=0, c=0; while ( ia<ta && ib<tb){ if (va[ia*2] < vb[ib*2] ) ia++; else if (va[ia*2] > vb[ib*2] ) ib++; else { c = c + va[ia*2+1]*vb[ib*2+1]; ia++; ib++; } return ( c ); 1 5 5 1 7 3 2 1 3 3 5 4

Addition of Two Vectors • The addition can be done in a similar way • Sort the cells in each vector according to their column ID • Scan two vectors simultaneously, from smaller indices • The positions of non-zero values in the resulted vectors are those having non-zero values in one of two vectors, thus can be easily identified by the scan 1 5 5 1 7 3 2 1 3 3 5 4

A Code for Addition int SVECTOR_add (int *vc, int *va, int ta, int *vb, int tb){ int ia=0, ib=0, ic=0, c, cc; while ( ia<ta || ib<tb){ if (ia == ta ){ c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if ( ib == tb ){c = va[ia*2+1]; cc = va[ia*2]; ia++; } else if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; } else { c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2]; ia++; ib++; } vc[ic*2] = cc; vc[ic*2+1] = c; ic++; return ( ic ); 1 5 5 1 7 3 2 1 3 3 5 4

Column: Endmarks do a Good Job! • Compared to inner product, code for addition is relatively long  we have exceptions at the end of the array • So, we are motivated to simplify the code by using “endmark” (endmark is a symbol that represent the end of the array, or something else representing the end) • 0, -1 or a very large value is used as an endmark • We prepare an additional cell next to the end of each array, and put an endmark at the cell

Column: Endmarks do a Good Job! (2) int SVECTOR_innerpro (int *vc, int *va, int ta, int *vb, int tb){ int ia=0, ib=0, ic=0, c, cc; while ( va[ia*2] != ENDMARK && vb[ib*2] != ENDMARK){ if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; } else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; } else { c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2]; ia++; ib++; } vc[ic*2] = cc; vc[ic*2+1] = c; ic++; vc[ic*2] = ENDMARK; return ( ic ); 1 5 5 1 7 3 ■ 2 1 3 3 5 4 ■

Matrix Multiplication • For sparse matrix multiplication, compute the inner products of all the pairs of a row and a column • However, a sparse matrix has row representations but not column representations, getting column vectors is hard • A simple solution is to use transposing algorithm that is explained in the section of bucket; we will have column representation • On the other hand, some data structures are designed to be enabled to trace also columns

Four-Direction List • Lists are good at storing sparse vectors, for tracing • However, collection of lists isn’t good at tracing column vectors, because the cells are not connected vertically • …so, let’s have a list connected in both row direction and column direction • Each cell has four arms, that point the neighboring cells in directions of (←, →, ↑, ↓) 4 2 7

Pointing the Neighbors • Links to four directions seems to form a mesh network, but not • …since, the links can cross • In the other words, this structure can be seen as a superimpose of two kinds of lists; horizontal direction and vertical direction, and the identical cells are unified into one 4 4 4 7 2 4

Having Lists of 2-Directions • If we have lists of row vectors and column vectors both, we can have the same accessibility, but insertions/deletions are not same • For example, when we want to delete a cell in a row vector, we would take long time to find the corresponding cell in column lists In four-direction lists, they are already unified 4 4 7

very popular structure Graph Structure • A graph is a structure composed of a set of vertices and a set of edges (an edge is a pair of vertices) • Formed by sets, so the information such as positions, shapes, and crossing edges do not matter, when it is drawn as a picture (a graph with shape/position information is called “graph visualization” or “embedded graph”) • When edges have directions (from one vertex to another), it is called directed very popular structure

Examples of Graph Data • Adjacency relation Hierarchy in an organization Similarity relation • Web network, human network, SNS friend network,…

Graph Terminology • Edge e is said to be incident to u, v, and vice versa, if e = (u,v) also u and v are said to be adjacent • The #edges incident to v is the degree of v • A graph having edges for any two vertices is a complete graph • When there are two or more edges connecting two vertices, the edges are called multiple edges • If there is a partition of vertices so that any edge connects a vertex in a group and one in the other, the graph is called bipartite graph

Storing a Graph • n vertices can be seen as numbers 0,…,n-1 • Then, an edge is a pair of numbers  can be stored by writing the pairs in array, lists, etc. • Further, we need something for the accessibility for example, we often visit a vertex, and go to the neighboring vertex, and so we need to scan all edges incident to the vertex

Using Matrix • The set of edges can be represented by a matrix as follows ① j-th row/column corresponds to vertex j, and ij-cell is 1 if there is edge (i, j) (called adjacency matrix)  + efficient for dense graph having many edges + multiplicity of edges can be represented by the value of a cell ② j-th row corresponds to vertex j, and each column corresponds to an edge; when edge e is incident to vertex i, ij cell is 1 (called incidence matrix) + multiple edges represented easily • Sparse matrix representation has advantage for incidence matrix and sparse graph

In Practice • 2-dimensional array is sufficient when the matrix size is small the cost is small, redundancy is small • Sparse matrix such as 100 by 100 with 10 non-zero elements in a row, sparse representation will be efficient (approximately, when density is less than 10%) + When we often want to scan non-zero elements, such as tracing all vertices adjacent to a vertex, sparse representation is useful + If we want to check whether there is an edge between two specified vertices, 2-dimensional array has advantage

Incidence Matrix • An incidence matrix represents the incidence relation between vertices and edges • Put indices from 0,…,n-1 to vertices, and 0,…,m-1 to edges + store edges incident to a vertex to the corresponding row = storing vertices incident to an edge in the corresponding column 0: 0,1 1: 1,5 2: 0,3 3: 1,4 4: 1,2 5: 4,5 6: 2,5 7: 2,3 8: 2,4 0: 1,3 1: 0,2,4,5 2: 1,3,4,5 3: 0,2 4: 1,2,5 5: 1,2,4 0: 0,2 1: 0,1,3,4 2: 4,6,7,8 3: 2,7 4: 3,5,8 5: 1,5,6 1 5 1 2 + 6 3 4 5 4 2 8 7 3

Advantage of Incidence Matrix • In the case of incidence matrix, each edge has ID  so, easy to handle the attached information to each edge  just allocate an array of size m, and it is sufficient • In the case of adjacency matrix, edge doesn’t have ID, thus not easy to manage correspondence of edge and its data • Multiple edges are also easy to handle 0: 0,1 1: 1,5 2: 0,3 3: 1,4 4: 1,2 5: 4,5 6: 2,5 7: 2,3 8: 2,4 0: 1,3 1: 0,2,4,5 2: 1,3,4,5 3: 0,2 4: 1,2,5 5: 1,2,4 0: 0,2 1: 0,1,3,4 2: 4,6,7,8 3: 2,7 4: 3,5,8 5: 1,5,6 1 5 1 2 + 6 3 4 5 4 2 8 7 3

Allocate Memory for Cells • Incidence matrix can be realized by cells of lists having four links like sparse matrix (two for vertices of the edges, and two for the edges in the vertex)  disadvantages of arrays are eliminated • Also can be of two array lists • or, prepare an array and edge i corresponds to cells 2i and 2i+1, to represent four links 0: 0,1 1: 1,5 2: 0,3 3: 1,4 4: 1,2 5: 4,5 6: 2,5 7: 2,3 8: 2,4 0: 1,3 1: 0,2,4,5 2: 1,3,4,5 3: 0,2 4: 1,2,5 5: 1,2,4 0: 0,2 1: 0,1,3,4 2: 4,6,7,8 3: 2,7 4: 3,5,8 5: 1,5,6 1 5 1 2 + 6 3 4 5 4 2 8 7 3

Exercise • Make an adjacency matrix of the following graph, and that in A sparse incidence matrix 6 5 1 4 2 3

Bipartite Graph • A bipartite graph is often seen as a representation of a (binary) (sparse) matrix  associate nodes of one group to rows, and the others to columns connect by edges between vertices corresponding a cell with non-zero value • A representation of different style 4 0: 4,6 1: 4,5 2: 5,6 3: 5,6 1 5 2 6 3

Column: Store Huge Graph • A graph needs two pointer (or integer) per edge weight, and etc. need more • 64 bits are required in 32 bit CPU • However, Web graphs have billion of vertices, and 20 billions of edges  160GB is necessary in this way • This is too much. Can we reduce the storage size?

Column: Store Huge Graph (2) ① Only few edges have large degrees Vertices are mainly adjacent to these few vertices  Put indices so that large degree vertices have small indices, and represent small indices by small number of bits, and large indices by many bits Ex.) • If the bit sequence representing a number begins with “0”, the following 7 bits represent [0-127] • If “10”, the following 14 bits represent 128+[0-16383] • If “11”, the following 30 bits represent 16384+128,…

Column: Store Huge Graph (3) ② Sort the sites in dictionary order of their URLs  links are usually to near, thus difference of ID’s becomes small • They can be recorded in the same way, to reduce the space • Using these, one edge needs just 10 bits Further, we can reduce it to 5 bits  The storage will be 20GB, thus can fit recent computers

Summary • Data structures for matrix • Structures for sparse matrix, and four directed lists • Structures for graphs: adjacency matrix and incidence matrix adjacency list