Advanced Topics in Data Mining Special focus: Social Networks.

Slides:



Advertisements
Similar presentations
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Week 5 - Models of Complex Networks I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Week 4 – Random Graphs Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Models of Network Formation Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Information Networks Small World Networks Lecture 5.
Advanced Topics in Data Mining Special focus: Social Networks.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
CS728 Lecture 5 Generative Graph Models and the Web.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / Random Graph Models: Create/Explain Complex Network Properties.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Mining and Searching Massive Graphs (Networks)
Network Statistics Gesine Reinert. Yeast protein interactions.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
Global topological properties of biological networks.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Statistical Properties of Massive Graphs (Networks) Networks and Measurements.
Computer Science 1 Web as a graph Anna Karpovsky.
Online Social Networks and Media Network models. What is a network model? Informally, a network model is a process (radomized or deterministic) for generating.
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
The Erdös-Rényi models
Information Networks Power Laws and Network Models Lecture 3.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Models and Algorithms for Complex Networks Networks and Measurements Lecture 3.
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
“Adversarial Deletion in Scale Free Random Graph Process” by A.D. Flaxman et al. Hammad Iqbal CS April 2006.
Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Network (graph) Models
Lecture 1: Complex Networks
Topics In Social Computing (67810)
Models of Network Formation
Models of Network Formation
Models of Network Formation
Models of Network Formation
Modelling and Searching Networks Lecture 5 – Random graphs
Modelling and Searching Networks Lecture 6 – PA models
Network Science: A Short Introduction i3 Workshop
Discrete Mathematics and its Applications Lecture 5 – Random graphs
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Discrete Mathematics and its Applications Lecture 6 – PA models
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Advanced Topics in Data Mining Special focus: Social Networks

Reminders By the end of this week/ beginning of next we need to have a tentative presentation schedule Each one of you should send me an about a theme by Friday, February 22.

What did we learn in the last lecture?

Degree distribution – What are the observed degree distributions Clustering coefficient – What are the observed clustering coefficients? Average path length – What are the observed average path lengths?

What are we going to learn in this lecture? How to generate graphs that have the desired properties – Degree distribution – Clustering coefficient – Average path length We are going to talk about generative models

What is a network model? Informally, a network model is a process (radomized or deterministic) for generating a graph Models of static graphs – input: a set of parameters Π, and the size of the graph n – output: a graph G(Π,n) Models of evolving graphs – input: a set of parameters Π, and an initial graph G 0 – output: a graph G t for each time t

Families of random graphs A deterministic model D defines a single graph for each value of n (or t) A randomized model R defines a probability space ‹G n,P› where G n is the set of all graphs of size n, and P a probability distribution over the set G n (similarly for t) – we call this a family of random graphs R, or a random graph R

Erdös-Renyi Random graphs Paul Erdös ( )

Erdös-Renyi Random Graphs The G n,p model – input: the number of vertices n, and a parameter p, 0 ≤ p ≤ 1 – process: for each pair (i,j), generate the edge (i,j) independently with probability p Related, but not identical: The G n,m model – process: select m edges uniformly at random

Graph properties A property P holds almost surely (or for almost every graph), if Evolution of the graph: which properties hold as the probability p increases? Threshold phenomena: Many properties appear suddenly. That is, there exist a probability p c such that for p p c the property holds a.s. What do you expect to be a threshold phenomenon in random graphs?

The giant component Let z=np be the average degree If z < 1, then almost surely, the largest component has size at most O(ln n) if z > 1, then almost surely, the largest component has size Θ(n). The second largest component has size O(ln n) if z =ω(ln n), then the graph is almost surely connected.

The phase transition When z=1, there is a phase transition – The largest component is O(n 2/3 ) – The sizes of the components follow a power-law distribution.

Random graphs degree distributions The degree distribution follows a binomial Assuming z=np is fixed, as n → ∞, B(n,k,p) is approximated by a Poisson distribution Highly concentrated around the mean, with a tail that drops exponentially

Random graphs and real life A beautiful and elegant theory studied exhaustively Random graphs had been used as idealized network models Unfortunately, they don’t capture reality…

A random graph example

Departing from the Random Graph model We need models that better capture the characteristics of real graphs – degree sequences – clustering coefficient – short paths

Graphs with given degree sequences input: the degree sequence [d 1,d 2,…,d n ] Can you generate a graph with nodes that have degrees [d 1,d 2,…,d n ] ? ?

Graphs with given degree sequences The configuration model – input: the degree sequence [d 1,d 2,…,d n ] – process: Create d i copies of node i Take a random matching (pairing) of the copies – self-loops and multiple edges are allowed Uniform distribution over the graphs with the given degree sequence

Example Suppose that the degree sequence is Create multiple copies of the nodes Pair the nodes uniformly at random Generate the resulting network 4 132

Graphs with given degree sequences How about simple graphs ? – No self loops – No multiple edges

Graphs with given degree sequences Realizability of degree sequences Lemma: A degree sequence d = [d(1),…,d(n)] with d(1)≥d(2)≥… ≥d(n) and d(1)+d(2)+…+d(n) even is realizable if and only if for every 1≤k ≤n-1 it holds that

Graphs with given degree sequences -- algorithm Input : d= [d(1),…,d(n)] Output: No or simple graph G=(V,E) with degree sequence d If Σ i=1…n d(i) is odd return “No” While 1 do – If there exist i with d(i) < 0 return “No” – If d(i)=0 for all i return the graph G=(V,E) – Pick random node v with d(v)>0 – S(v) = set of nodes with the d(v) highest d values – d(v) = 0 – For each node w in S(v) E = E\union (v,w) d(w) = d(w)-1

How can we generate data with power-law degree distributions?

Preferential Attachment in Networks First considered by [Price 65] as a model for citation networks – each new paper is generated with m citations (mean) – new papers cite previous papers with probability proportional to their indegree (citations) – what about papers without any citations? each paper is considered to have a “default” citation probability of citing a paper with degree k, proportional to k+1 Power law with exponent α = 2+1/m

Barabasi-Albert model The BA model (undirected graph) – input: some initial subgraph G 0, and m the number of edges per new node – the process: nodes arrive one at the time each node connects to m other nodes selecting them with probability proportional to their degree if [d 1,…,d t ] is the degree sequence at time t, the node t+1 links to node i with probability Results in power-law with exponent α = 3

Variations of the BA model Many variations have been considered

Copying model Input: – the out-degree d (constant) of each node – a parameter α The process: – Nodes arrive one at the time – A new node selects uniformly one of the existing nodes as a prototype – The new node creates d outgoing links. For the i th link with probability α it copies the i-th link of the prototype node with probability 1- α it selects the target of the link uniformly at random

An example

Copying model properties Power law degree distribution with exponent β = (2- α)/(1- α) Number of bipartite cliques of size i x d is ne -i The model has also found applications in biological networks – copying mechanism in gene mutations

Small world Phenomena So far we focused on obtaining graphs with power-law distributions on the degrees. What about other properties? – Clustering coefficient: real-life networks tend to have high clustering coefficient – Short paths: real-life networks are “small worlds” this property is easy to generate – Can we combine these two properties?

Small-world Graphs According to Watts [W99] – Large networks (n >> 1) – Sparse connectivity (avg degree z << n) – No central node (k max << n) – Large clustering coefficient (larger than in random graphs of same size) – Short average paths (~log n, close to those of random graphs of the same size)

Mixing order with randomness Inspired by the work of Solmonoff and Rapoport – nodes that share neighbors should have higher probability to be connected Generate an edge between i and j with probability proportional to R ij When α = 0, edges are determined by common neighbors When α = ∞ edges are independent of common neighbors For intermediate values we obtain a combination of order and randomness m ij = number of common neighbors of i and j p = very small probability

Algorithm Start with a ring For i = 1 … n – Select a vertex j with probability proportional to R ij and generate an edge (i,j) Repeat until z edges are added to each vertex

Clustering coefficient – Avg path length small world graphs

Watts and Strogatz model [WS98] Start with a ring, where every node is connected to the next z nodes With probability p, rewire every edge (or, add a shortcut) to a uniformly chosen destination. – Granovetter, “The strength of weak ties” order randomness p = 0 p = 1 0 < p < 1

Watts and Strogatz model [WS98] Start with a ring, where every node is connected to the next z nodes With probability p, rewire every edge (or, add a shortcut) to a uniformly chosen destination. – Granovetter, “The strength of weak ties” order randomness p = 0 p = 1 0 < p < 1

Clustering Coefficient – Characteristic Path Length log-scale in p When p = 0, C = 3(k-2)/4(k-1) ~ ¾ L = n/k For small p, C ~ ¾ L ~ logn

Next Class Some more generative models for social- network graphs