Week 3 - Complex Networks and their Properties

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
1 Dimension matching in Facebook and LinkedIn networks Anthony Bonato Ryerson University Seminar on Social Networks, Big Data, Influence, and Decision-Making.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Universal Random Semi-Directed Graphs
CS728 Lecture 5 Generative Graph Models and the Web.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Mining and Searching Massive Graphs (Networks)
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Link Analysis, PageRank and Search Engines on the Web
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
PageRank Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 27, 2014.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
Computer Science 1 Web as a graph Anna Karpovsky.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Presented By: - Chandrika B N
Networks - Bonato1 Modelling, Mining, and Searching Networks Anthony Bonato Ryerson University Master’s Seminar November 2012.
The Erdös-Rényi models
Lecture 6 - Models of Complex Networks II Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Log Dimension Hypothesis1 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University MITACS International Problem Solving Workshop July 2012.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Complex networks - Bonato1 Complex networks and their models Anthony Bonato Ryerson University Graduate Seminar October 2011.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
1 Burning a graph as a model of social contagion Anthony Bonato Ryerson University Institute of Software Chinese Academy of Sciences.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
Complex Networks: Models Lecture 2 Slides by Panayiotis TsaparasPanayiotis Tsaparas.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
Week 1 – Introduction to Graph Theory I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Models and Algorithms for Complex Networks Introduction and Background Lecture 1.
Miniconference on the Mathematics of Computation
1 How to burn a graph Anthony Bonato Ryerson University GRASCan 2015.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-line Social Networks Anthony Bonato Ryerson University ICMCM’09 December, 2009.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
1 Dimension matching in Facebook and LinkedIn networks Anthony Bonato Ryerson University Toronto, Canada ICMCE 2015.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Mathematics of the Web Prof. Sara Billey University of Washington.
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Peer-to-Peer and Social Networks Fall 2017
Discrete Mathematics and its Applications Lecture 1 – Graph Theory
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Modelling and Searching Networks Lecture 3 – ILT model
Lecture 23: Structure of Networks
Discrete Mathematics and its Applications Lecture 3 – ILT model
Miniconference on the Mathematics of Computation
Adjacency Matrices and PageRank
Modelling and Searching Networks Lecture 2 – Complex Networks
Modelling and Searching Networks Lecture 5 – Random graphs
18th Ontario Combinatorics Workshop On-line Social Networks
Discrete Mathematics and its Applications Lecture 5 – Random graphs
Presentation transcript:

Week 3 - Complex Networks and their Properties Miniconference on the Mathematics of Computation AM8002 Fall 2014 Week 3 - Complex Networks and their Properties Dr. Anthony Bonato Ryerson University

Complex Networks web graph, social networks, biological networks, internet networks, … Networks - Bonato

What is a complex network? no precise definition however, there is general consensus on the following observed properties large scale evolving over time power law degree distributions small world properties other properties depend on the kind of network being discussed

Examples of complex networks technological/informational: web graph, router graph, AS graph, call graph, e-mail graph social: on-line social networks (Facebook, Twitter, LinkedIn,…), collaboration graphs, co-actor graph biological networks: protein interaction networks, gene regulatory networks, food networks

Example: the web graph nodes: web pages edges: links one of the first complex networks to be analyzed viewed as directed or undirected Networks - Bonato

Example: On-line Social Networks (OSNs) nodes: users on some OSN edges: friendship (or following) links maybe directed or undirected Anthony Bonato - The web graph

Example: Co-author graph nodes: mathematicians and scientists edges: co-authorship undirected

Example: Co-actor graph nodes: actors edges: co-stars Hollywood graph undirected

Heirarchical social networks social networks which are oriented from top to bottom information flows one way examples: Twitter, executives in a company, terrorist networks

Example: protein interaction networks nodes: proteins in a living cell edges: biochemical interaction undirected Introducing the Web Graph - Anthony Bonato

Properties of complex networks Large scale: relative to order and size web graph: order > trillion some sense infinite: number of strings entered into Google Facebook: > 1 billion nodes; Twitter: > 500 million nodes much denser (ie higher average degree) than the web graph protein interaction networks: order in thousands

Properties of complex networks Evolving: networks change over time web graph: billions of nodes and links appear and disappear each day Facebook: grew to 1 billion users denser than the web graph protein interaction networks: order in the thousands evolves much more slowly

Properties of Complex Networks Power law degree distribution for a graph G of order n and i a positive integer, let Ni,n denote the number of nodes of degree i in G we say that G follows a power law degree distribution if for some range of i and some b > 2, b is called the exponent of the power law Complex Networks

Properties of Complex Networks power law degree distribution in the web graph: (Broder et al, 01) reported an exponent b = 2.1 for the in-degree distribution (in a 200 million vertex crawl) Complex Networks

Interpreting a power law Many low-degree nodes Few high-degree nodes Complex Networks

Binomial Power law Highway network Air traffic network Complex Networks

Notes on power laws b is the exponent of the power law note that the law is approximate: constants do not affect it asymptotic: holds only for large n may not hold for all degrees, but most degrees (for example, sufficiently large or sufficiently small degrees) Complex Networks

Degree distribution (log-log plot) of a power law graph Complex Networks

Power laws in OSNs Complex Networks

Discussion Which of the following are power law graphs? High school/secondary school graph. Nodes: students in a high school; edges: friendship links. Power grids. Nodes: generators, power plants, large consumers of power; edges: electrical cable. Banking networks. Nodes: banks; edges: financial transaction.

Graph parameters Wiener index, W(G) average distance: clustering coefficient: Wiener index, W(G) Complex Networks

Examples Cliques have average distance 1, and clustering coefficient 1 Triangle-free graphs have clustering coefficient 0 Clustering coefficient of following graph is 0.75. Note: average distance bounded above by diameter

Properties of Complex Networks Small world property small world networks introduced by social scientists Watts & Strogatz in 1998 low distances diam(G) = O(log n) L(G) = O(loglog n) higher clustering coefficient than random graph with same expected degree Complex Networks

Nuit Blanche Ryerson City of Toronto Four Seasons Hotel Frommer’s Greenland Tourism

Sample data: Flickr, YouTube, LiveJournal, Orkut (Mislove et al,07): short average distances and high clustering coefficients Complex Networks

Other properties of complex networks many complex networks (including on-line social networks) obey two additional laws: Densification Power Law (Leskovec, Kleinberg, Faloutsos,05): networks are becoming more dense over time; i.e. average degree is increasing |(E(Gt)| ≈ |V(Gt)|a where 1 < a ≤ 2: densification exponent Complex Networks

Densification – Physics Citations 1.69 Complex Networks

Densification – Autonomous Systems e(t) 1.18 n(t) Complex Networks

Decreasing distances (Leskovec, Kleinberg, Faloutsos,05): distances (diameter and/or average distances) decrease with time (Kumar et al,06): Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow Complex Networks

Diameter – ArXiv citation graph time [years] Complex Networks

Other properties Connected component structure: emergence of components; giant components Spectral properties: adjacency matrix and Laplacian matrices, spectral gap, eigenvalue distribution Small community phenomenon: most nodes belong to small communities (ie subgraphs with more internal than external links) …

Discussion Compute the average distance of each of the following graphs. A star with n nodes (i.e. a tree of order n with one vertex of order n-1, the rest degree 1) A path with n nodes A wheel with n+1 nodes, n>2.

Web Search the web contains large amounts of information (≈ 4 zettabytes = 1021 bytes) rely on web search engines, such as Google, Yahoo! Search, Bing, …

Search Engines search engines are tools designed to hunt for information on the web they do this by first crawling the web by making copies of pages and their links

Indexing the search engine then indexes the information crawled from the web, storing and sorting it

User interface users type in queries and get back a sorted list of web pages and links

Key questions How do search engines choose their rankings? What makes modern search engines more accurate than the first search engines? What does math have to do with it?

Challenges of web search Massive size. Multimedia. Authorities.

Text based search first search engines ranked pages using word frequency eg: if “baseball’’ appears many times on page X, then X is ranked higher on a search for “baseball’’ easily spammed: insert “baseball” 100s of times on page!

Analogy: evil librarian you are looking for a book on baseball in a library evil librarian spends her time moving books to fool you

Then came

Google uses graph theory! Google founders: Larry Page, Sergey Brin

Pagerank is the probability a random surfer visits a page PageRank models web surfing via a random walk surfer usually moves via out-links on occasion, the surfer teleports to a random page

How PageRank addresses the challenges of web search PageRank can be computed quickly, even for large matrices PageRank relies only on the link structure popular pages are those with many in-links, or linked to other popular pages “authorities” have higher PageRank

Google random walk this modification of the usual random walk is called the Google random walk note that it takes place on a directed graph

The Google Matrix given a digraph G with nodes {1,…,n}, define the matrix P1 form P2 by replacing any zero rows of P1 by 1/nJ1,n define the Google matrix P as c in (0,1) is the teleportation constant

Example

Example, continued

Motivation P1 corresponds to the random walk using out-links P2 takes care of spider traps: nodes with zero out-degree P(G) adds in the teleportation: 85% of the time follow out-links, 15% of the time use jump to a new node chosen at random from all nodes

PageRank defined Theorem (Brin, Page, 2000) The Google random walk converges to a stationary distribution s, which is the dominant eigenvector of P(G). That is, the PageRank vector s solves the linear system: P(G)s = s.

Power method for a fixed integer n > 0, let z0 be the stochastic vector whose every entry is 1/n define zt+1T = ztTP = …= z0TPt Lemma 6 (Power Method): The limit of the sequence of (zt : t ≥ 0) is the dominant eigenvector. gives a simple method of computing Pagerank: multiply by powers of P(G)

Example, continued PageRank vector: