CompSci 100e 10.1 Is there a Science of Networks? l What kinds of networks are there? l From Bacon numbers to random graphs to Internet From FOAF to Selfish Routing: apparent similarities between many human and technological systems & organization Modeling, simulation, and hypotheses Compelling concepts Metaphor of viral spread Properties of connectivity has qualitative and quantitative effects Computer Science? l From the facebook to tomogravity How do we model networks, measure them, and reason about them? What mathematics is necessary? Will the real-world intrude?
CompSci 100e 10.2 From subsets to graphs with bits l We’ll consider SequenceSync APT What is a “vertex” in the graph? Where are arcs? For state-0, we have {1,5,4,2} for transitions l We’ll consider a graph in which vertices are sets of states Start with every possible state in our initial vertex
CompSci 100e 10.3 Review: Vocabulary l Graphs are collections of vertices and edges (vertex also called node) Edge connects two vertices Direction can be important, directed edge, directed graph Edge may have associated weight/cost l A vertex sequence v 0, v 1, …, v n-1 is a path where v k and v k+1 are connected by an edge. If some vertex is repeated, the path is a cycle A graph is connected if there is a path between any pair of vertices l What vertices are reachable from a given vertex? Traverse the graph… NYC Phil Boston Wash DC LGALAX ORD DCA $186 $412 $1701 $441
CompSci 100e 10.4 Word ladder
CompSci 100e 10.5 Vocabulary/Traversals l Connected? Connected components? Weakly connected (directionless) Degree: # edges incident a vertex indegree (enter), outdegree (exit) l Starting at 7 where can we get? Depth-first search, envision each vertex as a room, with doors leading out Go into a room, mark the room, choose an unused door, exit –Don’t go into a room you’ve already been in (see mark) Backtrack if all doors used (to room with unused door) Rooms are stacked up, backtracking is really recursion One alternative uses a queue: breadth-first search
CompSci 100e 10.6 Six Degrees of Bacon l Background Stanley Milgram’s Six Degrees of Separation? Craig Fass, Mike Ginelli, and Brian Turtle invented it as a drinking game at Albright College Brett Tjaden, Glenn Wasson, Patrick Reynolds have run t online website from UVa and beyond Instance of Small-World phenomenon l handles 2 kinds of requests 1. Find the links from Actor A to Actor B. 2. How good a center is a given actor? How does it answer these requests?
CompSci 100e 10.7 How does the Oracle work? l Not using Oracle™ l Queries require traversal of the graph BN = 0 Mystic River Apollo 13 Footloose John Lithgow Sarah Jessica Parker Bill Paxton Tom Hanks Sean Penn Tim Robbins BN = 1 Kevin Bacon
CompSci 100e 10.8 How does the Oracle Work? Kevin Bacon Mystic River Apollo 13 Footloose John Lithgow Sarah Jessica Parker Bill Paxton Tom Hanks Sean Penn Tim Robbins BN = 0 BN = 1 Sweet and Lowdown Fast Times at Ridgemont High War of the Worlds The Shawshank Redemption Cast Away Forrest Gump Tombstone A Simple Plan Morgan Freeman Sally Field Helen Hunt Val Kilmer Miranda Otto Judge Reinhold Woody Allen Billy Bob Thornton BN = 2 l BN = Bacon Number l Queries require traversal of the graph
CompSci 100e 10.9 How does the Oracle work? Mystic River Footloose John Lithgow Sarah Jessica Parker Tom Hanks Sean Penn Tim Robbins BN = 0 BN = 1 Sweet and Lowdown Fast Times at Ridgemont High War of the Worlds The Shawshank Redemption Cast Away Forrest Gump A Simple Plan Morgan Freeman Sally Field Helen Hunt Miranda Otto Judge Reinhold Woody Allen Billy Bob Thornton BN = 2 Bill Paxton Tombstone Val Kilmer Apollo 13 Kevin Bacon l How do we choose which movie or actor to explore next? l Queries require traversal of the graph
CompSci 100e Actor-Actor Graph
CompSci 100e Movie-Movie Graph
CompSci 100e Actor-Movie Graph
CompSci 100e Traversals l Connected? Connected components? Degree: # edges incident a vertex l Starting at Bacon where can we get? Random search, choose a neighboring vertex at random Can we move in circles? Depth-first search, envision each vertex as a room, with doors leading out Go into a room, mark the room, choose an unused door, exit –Don’t go into a room you’ve already been in (see mark) Backtrack if all doors used (to room with unused door) Rooms are stacked up, backtracking is really recursion One alternative uses a queue: breadth-first search
CompSci 100e Breadth first search l In an unweighted graph this finds the shortest path between a start vertex and every vertex Visit every node one away from start Visit every node two away from start This is every node one away from a node one away Visit every node three away from start, … l Put vertex on queue to start (initially just one) Repeat: take vertex off queue, put all adjacent vertices on Don’t put a vertex on that’s already been visited (why?) When are 1-away vertices enqueued? 2-away? 3-away? How many vertices on queue?
CompSci 100e General graph traversal COLLECTION_OF_VERTICES fringe; fringe = INITIAL_COLLECTION; while (!fringe.isEmpty()) { Vertex v = fringe.removeItem(QUEUE_FN); if (! MARKED(v)) { MARK(v); VISIT(v); for each edge (v,w) { if (NEEDS_PROCESSING(w)) Add w to fringe according to QUEUE_FN; }
CompSci 100e Breadth-first search l Visit each vertex reachable from some source in breadth-first order l Like level-order traversal Queue fringe; fringe = {v}; while (!fringe.isEmpty()) { Vertex v = fringe.dequeue(); if (! getMark(v)) { setMark(v); VISIT(v); for each edge (v,w) { if (MARKED(w)) fringe.enqueue(w); } l How do we change to make depth-first search? How does the order visited change?
CompSci 100e Center of the Hollywood Universe? l 1,018,678 people can be connected to Bacon l Is he the center of the Hollywood Universe? Who is? Who are other good centers? What makes them good centers? l Centrality Closeness: the inverse average distance of a node to all other nodes Geodesic: shortest path between two vertices Closeness centrality: number of other vertices divided by the sum of all distances between the vertex and all others. Degree: the degree of a node Betweenness: a measure of how much a vertex is between other nodes
CompSci 100e Closeness Centrality An actor is important if she is relatively close to all other actors Jordan Centrality uses the inverse of the length of the longest path to any other vertex where d(x,y) is the length of shortest path between x and y
CompSci 100e Degree Centrality Actor with the most ties is the most important A purely local measure where e ij is 1 if an v i and v j are adjacent, 0 otherwise
CompSci 100e Betweenness Centrality An actor who lies on communication paths can control communication flow, and is thus important Information Centrality uses all paths in the network, and weights them based on their length. where g jk = the number of geodesics connecting jk, and g jk (v i ) = the number that actor i is on.