School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

Great Theoretical Ideas in Computer Science
Chapter 9 Graphs.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
22C:19 Discrete Math Graphs Fall 2010 Sukumar Ghosh.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Introduction This chapter explores graphs and their applications in computer science This chapter explores graphs and their applications in computer science.
Data Structures Using C++
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Mining and Searching Massive Graphs (Networks)
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part R5. Graphs.
Graph & BFS.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
CSE 321 Discrete Structures Winter 2008 Lecture 25 Graph Theory.
W. D. Grover TRLabs & University of Alberta © Wayne D. Grover 2002, 2003 Graph theory and routing (initial background) E E Lecture 4.
CS8803-NS Network Science Fall 2013
22C:19 Discrete Math Graphs Spring 2014 Sukumar Ghosh.
Graph Essentials Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Graph Essentials Networks A network is a graph. – Elements of the network.
Social Media Mining Graph Essentials.
GRAPH Learning Outcomes Students should be able to:
Data Structures Using C++ 2E
Let G be a pseudograph with vertex set V, edge set E, and incidence mapping f. Let n be a positive integer. A path of length n between vertex v and vertex.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 6-1 Chapter 6 Graphs Introduction to Data Structure CHAPTER 6 GRAPHS 6.1 The Graph Abstract Data Type.
Graph Theoretic Concepts. What is a graph? A set of vertices (or nodes) linked by edges Mathematically, we often write G = (V,E)  V: set of vertices,
Chapter 2 Graph Algorithms.
Class 2: Graph theory and basic terminology Learning the language Network Science: Graph Theory 2012 Prof. Albert-László Barabási Dr. Baruch Barzel, Dr.
Lecture 12: Network Visualization Slides are modified from Lada Adamic, Adam Perer, Ben Shneiderman, and Aleks Aris.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:
Foundations of Discrete Mathematics
Can you connect the dots as shown without taking your pen off the page or drawing the same line twice.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
1 CS104 : Discrete Structures Chapter V Graph Theory.
Lecture 5: Mathematics of Networks (Cont) CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Mathematics of Networks (Cont)
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
Data Structures & Algorithms Graphs
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
Lecture 4: Mathematics of Networks CS 790g: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Network Questions: Structural 1. How many connections does the average node have? 2. Are some nodes more connected than others? 3. Is the entire network.
School of Information University of Michigan Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution.
Discrete Mathematical Structures: Theory and Applications
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Graph Theory and Applications
Graph theory and networks. Basic definitions  A graph consists of points called vertices (or nodes) and lines called edges (or arcs). Each edge joins.
Data Structures & Algorithms Graphs Richard Newman based on book by R. Sedgewick and slides by S. Sahni.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees.
Graph Theory Unit: 4.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Chapter 9: Graphs.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Class 2: Graph Theory IST402.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Graph Theory Graph Theory - History Leonhard Euler's paper on “Seven Bridges of Königsberg”, published in 1736.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
1 GRAPH Learning Outcomes Students should be able to: Explain basic terminology of a graph Identify Euler and Hamiltonian cycle Represent graphs using.
Week 10 - Wednesday.  What did we talk about last time?  Counting practice  Pigeonhole principle.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
1 Data Structures and Algorithms Graphs. 2 Graphs Basic Definitions Paths and Cycles Connectivity Other Properties Representation Examples of Graph Algorithms:
Graph Theory An Introduction.
Graph theory Definitions Trees, cycles, directed graphs.
Network Science: A Short Introduction i3 Workshop
Graphs Chapter 13.
Presentation transcript:

School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic

Outline Basic network metrics Bipartite graphs Graph theory in math Pajek

Network elements: edges Directed (also called arcs) A -> B A likes B, A gave a gift to B, A is B’s child Undirected A B or A – B A and B like each other A and B are siblings A and B are co-authors Edge attributes weight (e.g. frequency of communication) ranking (best friend, second best friend…) type (friend, relative, co-worker) properties depending on the structure of the rest of the graph: e.g. betweenness

Directed networks Ada Cora Louise Jean Helen Martha Alice Robin Marion Maxine Lena Hazel Hilda Frances Eva Ruth Edna Adele Jane Anna Mary Betty Ella Ellen Laura Irene girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960) first and second choices shown

Edge weights can have positive or negative values One gene activates/inhibits another One person trusting/distrusting another Research challenge: How does one ‘propagate’ negative feelings in a social network? Is my enemy’s enemy my friend? Transcription regulatory network in baker’s yeast

Adjacency matrices Representing edges (who is adjacent to whom) as a matrix A ij = 1 if node i has an edge to node j = 0 if node i does not have an edge to j A ii = 0 unless the network has self-loops A ij = A ji if the network is undirected, or if i and j share a reciprocated edge i j i i j Example: A =

Adjacency lists Edge list Adjacency list is easier to work with if network is large sparse quickly retrieve all neighbors for a node 1: 2: 3 4 3: 2 4 4: 5 5:

Nodes Node network properties from immediate connections indegree how many directed edges (arcs) are incident on a node outdegree how many directed edges (arcs) originate at a node degree (in or out) number of edges incident on a node from the entire graph centrality (betweenness, closeness) outdegree=2 indegree=3 degree=5

Node degree from matrix values Outdegree = A = example: outdegree for node 3 is 2, which we obtain by summing the number of non- zero entries in the 3 rd row Indegree = A = example: the indegree for node 3 is 1, which we obtain by summing the number of non-zero entries in the 3 rd column

Other node attributes Homophily: tendency of like individuals to associate with one another take your pick… geographical location function musical tastes…

Network metrics: degree sequence and degree distribution Degree sequence: An ordered list of the (in,out) degree of each node In-degree sequence: [2, 2, 2, 1, 1, 1, 1, 0] Out-degree sequence: [2, 2, 2, 2, 1, 1, 1, 0] (undirected) degree sequence: [3, 3, 3, 2, 2, 1, 1, 1] Degree distribution: A frequency count of the occurrence of each degree In-degree distribution: [(2,3) (1,4) (0,1)] Out-degree distribution: [(2,4) (1,3) (0,1)] (undirected) distribution: [(3,3) (2,2) (1,3)]

Network metrics: connected components Strongly connected components Each node within the component can be reached from every other node in the component by following directed links Strongly connected components B C D E A G H F Weakly connected components: every node can be reached from every other node by following links in either direction A B C D E F G H A B C D E F G H Weakly connected components A B C D E G H F In undirected networks one talks simply about ‘connected components’

Network metrics: shortest paths Shortest path (also called a geodesic path) The shortest sequence of links connecting two nodes Not always unique A and C are connected by 2 shortest paths A – E – B - C A – E – D - C Diameter: the largest geodesic distance in the graph A B C D E The distance between A and C is the maximum for the graph: 3 Caution: some people use the term ‘diameter’ to be the average shortest path distance, in this class we will use it only to refer to the maximal distance

Giant components and the web graph if the largest component encompasses a significant fraction of the graph, it is called the giant component

The bowtie model of the web The Web is a directed graph: webpages link to other webpages The connected components tell us what set of pages can be reached from any other just by surfing (no ‘jumping’ around by typing in a URL or using a search engine) Broder et al – crawl of over 200 million pages and 1.5 billion links. SCC – 27.5% IN and OUT – 21.5% Tendrils and tubes – 21.5% Disconnected – 8% image: Mark Levene

bipartite (two-mode) networks edges occur only between two groups of nodes, not within those groups for example, we may have individuals and events directors and boards of directors customers and the items they purchase metabolites and the reactions they participate in

going from a bipartite to a one-mode graph One mode projection two nodes from the first group are connected if they link to the same node in the second group some loss of information naturally high occurrence of cliques Two-mode network group 1 group 2

Now in matrix notation B ij = 1 if node i from the first group links to node j from the second group = 0 otherwise B is usually not a square matrix! for example: we have n customers and m products i j B =

Collapsing to a one-mode network i and k are linked if they both link to j A ik =  j B ij B kj A= B B T the transpose of a matrix swaps B xy and B yx if B is an nxm matrix, B T is an mxn matrix i j=1 k j=2 B =B T =

Matrix multiplication general formula for matrix multiplication Z ij =  k X ik Y kj let Z = A, X = B, Y = B T A = = = 1*1+1*1 + 1*0 + 1*0 = 2

Collapsing a two-mode network to a one mode-network Assume the nodes in group 1 are people and the nodes in group 2 are movies The diagonal entries of A give the number of movies each person has seen The off-diagonal elements of A give the number of movies that both people have seen A is symmetric A =

Networks of actors

History: Graph theory Euler’s Seven Bridges of Königsberg – one of the first problems in graph theory Is there a route that crosses each bridge only once and returns to the starting point?

Eulerian paths If starting point and end point are the same: only possible if no nodes have an odd degree each path must visit and leave each shore If don’t need to return to starting point can have 0 or 2 nodes with an odd degree Eulerian path: traverse each edge exactly once Hamiltonian path: visit each vertex exactly once

Bi-cliques (cliques in bipartite graphs) K m,n is the complete bipartite graph with m and n vertices of the two different types K 3,3 maps to the utility graph Is there a way to connect three utilities, e.g. gas, water, electricity to three houses without having any of the pipes cross? K 3,3 Utility graph

Planar graphs A graph is planar if it can be drawn on a plane without any edges crossing

When graphs are not planar Two graphs are homeomorphic if you can make one into the other by adding a vertex of degree 2

Cliques and complete graphs K n is the complete graph (clique) with K vertices each vertex is connected to every other vertex there are n*(n-1)/2 undirected edges K5K5 K8K8 K3K3

Peterson graph Example of using edge contractions to show a graph is not planar

Edge contractions defined A finite graph G is planar if and only if it has no subgraph that is homeomorphic or edge-contractible to the complete graph in five vertices (K 5 ) or the complete bipartite graph K 3, 3. (Kuratowski's Theorem)

graph density Of the connections that may exist between n nodes directed graph e max = n*(n-1) each of the n nodes can connect to (n-1) other nodes undirected graph e max = n*(n-1)/2 since edges are undirected, count each one only once What fraction are present? density = e/ e max For example, out of 12 possible connections, this graph has 7, giving it a density of 7/12 = But it is more difficult for a larger network to achieve the same density measure not useful for comparing networks of different densities

#s of planar graphs of different sizes 1:1 2:2 3:4 4:11 Every planar graph has a straight line embedding (homework exercise)

Trees Trees are undirected graphs that contain no cycles

examples of trees In nature trees river networks arteries (or veins, but not both) Man made sewer system Computer science binary search trees decision trees (AI) Network analysis minimum spanning trees from one node – how to reach all other nodes most quickly may not be unique, because shortest paths are not always unique depends on weight of edges

Using Pajek for exploratory social network analysis Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’ website: vlado.fmf.uni-lj.si/pub/networks/pajek/ download application (free) tutorials lectures data sets Windows only (works on Linux via Wine) can be installed via NAL in the student lab (DIAD) helpful book: ‘Exploratory Social Network Analysis with Pajek’ by Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj first 2 chapters are required reading and on cTools

Pajek interface Drop down list of networks opened or created with pajek. Active is displayed Drop down list of network partitions by discrete variables, e.g. degree, mode, label Drop down list of continuous node attributes, e.g. centrality, clustering coefficients things we’ll use right away things we’ll use later for clustering

opening a network file click on folder icon to open a file Save changes to your network, network partitions, etc., if you’d like to keep them

Working with network files in Pajek The active network, partition, etc is shown on top of the drop down list Draw the network

Pajek data format *Vertices 26 1 "Ada" "Cora" "Louise" *Arcs c Black.. *Edges c Black Ada Cora Louise number of vertices vertex x,y,z coordinates (optional) directed edges undirected edges from Ada(1) to Louise(3) as choice “2” and color Black between Ada(1) to Cora(2) as choice “1” and color Black

Live demo of Pajek Opening a network Visualization Essential measurements

Final project guidelines Work individually or in groups (up to 4 people) Important dates Feb. 13 th Project proposals due (5%) 1 page abstract & 5 minute class presentation March 20 th Project status report due (5%) 3-6 pages of result summaries (including figures and tables) plan of remaining work April 17 th in class student presentations of results (5%) April 24 th final project reports due (25%) 6-12 pages of related work main results ‘future’ work/extensions

Final Project Option 1: Analyze a network What it should be More than just a measurement of the average shortest path, clustering coefficient, and degree distribution An interpretation of measurement results If applicable: discovery of community or other structure assortativity motifs weights, thresholds longitudinal data (how the network changes over time) Visualizations of all or part of the network that point out a particular feature Qualitative comparison with other networks What it should not be a literature review The data can be artificially generated or a real-world dataset If you intend to work on data concerning human subjects, you may need to start an IRB application ASAP

Final Project Option 2: New network model What it should be Method for generating a network e.g. preferential attachment optimization wrt. different criteria Analysis of resulting network comparison with random graphs how do attributes change depending on model parameters What it should not be an already thoroughly explored model

Final Project Option 3: Novel algorithm What it should be An algorithm to analyze the network e.g. clustering or community detection algorithm webpage ranking algorithm OR a process that is influenced by the network gossip spreading games such as the prisoner’s dilemma Analysis of algorithm on several different networks What it should not be an exact replica of an existing algorithm applied to a network where it has already been studied