1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou.

Slides:



Advertisements
Similar presentations
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Positive and Negative Relationships Chapter 5, from D. Easley and J. Kleinberg book.
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡
Practical Recommendations on Crawling Online Social Networks
Construction of Simple Graphs with a Target Joint Degree Matrix and Beyond Minas Gjoka, Balint Tillman, Athina Markopoulou University of California, Irvine.
Properties of the Binomial Probability Distributions 1- The experiment consists of a sequence of n identical trials 2- Two outcomes (SUCCESS and FAILURE.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Chapter 8 Estimation: Additional Topics
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
DISSIMILIRATY ANALYSIS. Introduction In DISSIMILIRATY ANALYSIS we are going to discuss Knowledge Representation Systems in which neither condition nor.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
Geographic Routing Without Location Information A. Rao, C. Papadimitriou, S. Shenker, and I. Stoica In Proceedings of the 9th Annual international Conference.
The Middle East The The Arab World.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
Models of Influence in Online Social Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
1 Link-Trace Sampling for Social Networks: Advances and Applications Maciej Kurant (UC Irvine) Join work with: Minas Gjoka (UC Irvine), Athina Markopoulou.
Entropy Rate of a Markov Chain
Chapter 2: Statistics of One Variable
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
1 Sampling Massive Online Graphs Challenges, Techniques, and Applications to Facebook Maciej Kurant (UC Irvine) Joint work with: Minas Gjoka (UC Irvine),
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Continuous Probability Distributions Continuous random variable –Values from interval of numbers –Absence of gaps Continuous probability distribution –Distribution.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Poking Facebook: Characterization of OSN Applications Minas Gjoka, Michael Sirivianos, Athina Markopoulou, Xiaowei Yang University of California, Irvine.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Bruno Ribeiro Don Towsley University of Massachusetts Amherst IMC 2010 Melbourne, Australia.
CHAPTER SEVEN ESTIMATION. 7.1 A Point Estimate: A point estimate of some population parameter is a single value of a statistic (parameter space). For.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
A Visual and Statistical Benchmark for Graph Sampling Methods Fangyan Zhang 1 Song Zhang 1 Pak Chung Wong 2 J. Edward Swan II 1 T.J. Jankun-Kelly 1 1 Mississippi.
Example-1: An insurance company sells a 10,000 TRL 1-year term insurance policy at an annual premium of 290 TRL. Based on many year’s information, the.
In 1922, the Ottoman Empire is broken up Over the next 35 years the Middle East undergoes several drastic changes… New states are formed and independence.
Minas Gjoka, Emily Smith, Carter T. Butts
Jordan Morocco Saudi Arabia Algeria Yemen Tunisia Oman Libya
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
A study involving stress is done on a college campus among the students. The stress scores are known to follow a uniform distribution with the lowest stress.
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
A Simulation-Based Study of Overlay Routing Performance CS 268 Course Project Andrey Ermolinskiy, Hovig Bayandorian, Daniel Chen.
ESTIMATION Prepared by: Paolo Lorenzo Bautista. Estimation  We wish to estimate a characteristic of the population, by using information from the sample.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
What does the term Middle East mean? “Diversity & Nationalism”: Pg. 888.
Chapter Confidence Intervals 1 of 31 6  2012 Pearson Education, Inc. All rights reserved.
Random Walk for Similarity Testing in Complex Networks
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Social Networks Analysis
Source: Freedom House, Freedom in the World 2006
Psychology 202a Advanced Psychological Statistics
Uniform Sampling from the Web via Random Walks
Exact Inference Continued
Dieudo Mulamba November 2017
Hidden Markov Models Part 2: Algorithms
Directions: Identify the names of the labeled countries
Lecture 2: Complex Networks
Middle East Map By: Andrew W. Austin C. Chloe I.
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Chapter 9: One- and Two-Sample Estimation Problems:
(a) Venn diagram showing the degree of overlap of the following different approaches: G-test for significant differences between groups (with Bonferroni.
Presentation transcript:

1 Coarse-Grained Topology Estimation via Graph Sampling Maciej Kurant 1 Minas Gjoka 2 Yan Wang 2 Zack W. Almquist 2 Carter T. Butts 2 Athina Markopoulou 2 1 ETHZ, 2 UC Irvine 17 Aug 2012, WOSN

Coarse-grained topology A B nodes belong to different categories Example categories: countries universities workplaces religion age music genres … (19 March 2012)

Number of edges between A and B ? Coarse-grained topology A B nodes belong to different categories Not normalized by the size of categories!

Probability that a random node in A is a neighbor of a random node in B 4 Coarse-grained topology A B A, B - all nodes labeled by ‘A’ and ‘B’, respectively all existing edges between A and B all possible edges between A and B nodes belong to different categories

Facebook: 800+M users 150 friends each (on average) 8 bytes (64 bits) per user ID The raw connectivity data, with no attributes: 800 x 150 x 8B = 960 GB This is neither feasible nor practical. Solution: Sampling! To get this data, one would have to download: 200 TB of HTML data! 5 Name School / Workplace City or country (before 2010) List of friends

6 Coarse-grained topology from a sample UIS – Uniform Independence Sample A B RW – Random Walk sample A B estimate

7 Coarse-grained topology from a sample UIS – Uniform Independence Sample A B RW – Random Walk sample A B estimate sampling probability w(v) proportional to node degree

UISRW N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) A Estimating category size |A|

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) This correction is essential! UISRW Estimating category size |A|

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) all existing edges between A and B all possible edges between A and B all observed edges between A and B all edges we could have observed between A and B A, BA, B UISRW Estimating edge weights w(A,B) (induced)

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) UISRW Estimating edge weights w(A,B) (induced)

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling) A, BA, B

N - number of nodes in the graph A, B - all nodes labeled by ‘A’ and ‘B’, respectively S - all sampled nodes S A, S B - nodes sampled in A and B, respectively w(v)- sampling weight of node v (under RW equal to degree of v) E a,B - all edges between node a and nodes in B UISRW Estimating edge weights w(A,B) (star sampling)

16 UIS RW Category size Edge weight inducedstarinducedstar Estimators A B We prove the consistency of all these estimators

Performance evaluation 17

Facebook: Texas sample size |S| Fully known graph

sample size |S| Facebook online Online graph [swrw10] M. Kurant, M. Gjoka, C. T. Butts and A. Markopoulou, “Walking on a Graph with a Magnifying Glass”, SIGMETRICS 2011.

geosocialmap.com 20

geosocialmap.com

Public and private colleges in the USA geosocialmap.com 24

geosocialmap.com The world according to Facebook 25

26 Egypt Saudi Arabia United Arab Emirates Lebanon Jordan Israel Strong clusters among middle-eastern countries

UIS A B Summary Consistent estimators under induced and star sampling Coarse-grained topology Original (unknown) topology RW geosocialmap.com More info: Kiitos!