Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),

Similar presentations


Presentation on theme: "Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),"— Presentation transcript:

1 On the bias of Breadth First Search (BFS) and of other graph sampling techniques
Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI), Patrick Thiran (EPFL). 08 Sep 2010, ITC’22, Amsterdam, Netherlands

2 Breadth First Search (BFS)
K F A J C G D B A C F E I H G K J Not feasible for huge online graphs! E.g., a full BFS of the friendship graph of would require 200TB of html traffic.

3 BFS sample of a large graph
I D H K F A J C G D B A C F E I H G K J sampling budget

4 Why sample with BFS? BFS is a well known textbook technique
BFS sample is a nice looking graph E.g.., BFS of a lattice is a lattice We can study its topological characteristics, which is not possible with random walks It is used in practice: Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong, “Analysis of Topological Characteristics of Huge Online Social Networking Services,” in Proc. of WWW, 2007. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and S. Bhattacharjee, “Measurement and Analysis of Online Social Networks,” in Proc. of IMC, 2007. C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao, “User interactions in social networks and their implications,” in Proc. of EuroSys, 2009.

5 Why? Our BFS samples of . qk - observed node degree distribution
pk - real node degree distribution BFS sample size: 100K nodes Facebook size: M nodes Why?

6 Our Goal ? qk ( f ) = ? Graph traversals on RG(pk): BFS
- real average node degree - real average squared node degree. This bias has been empirically observed in the past, but never formally analyzed.

7 Graph model RG(pk) Random graph RG(pk) with a given node degree distribution pk Can be generated by configuration model: Example: |V| = 4 and pk: p1= p2= p3= p4 = 0.25 ‘stubs’

8 Approach 1: Brute force Generate all possible graphs, and ... No way!!
Remedy: “The Principle of Deferred Decisions” So we can generate the graph ‘on the fly’, while exploring it!

9 Approach 2: The Principle of Deferred Decisions
v ? u This does not scale! (because of dependencies between stubs) * we assumed that the generated graph is connected

10 Approach 3: Breaking the stub dependencies
v4 V2 3 1 1 2 1 3 4 2 v1 2 v3 1 1 time t Originally proposed in: J. H. Kim, “Poisson cloning model for random graphs,” International Congress of Mathematicians (ICM), 2006 (preprint in 2004). Developped in: D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, “On the bias of traceroute sampling: or, power-law degree distributions in regular graphs,” in STOC, 2005. (both in a different context)

11 Approach 3: Breaking the stub dependencies
v4 V2 3 1 1 2 1 3 4 2 v1 2 v3 1 1 time t Originally proposed in: J. H. Kim, “Poisson cloning model for random graphs,” International Congress of Mathematicians (ICM), 2006 (preprint in 2004). Developped in: D. Achlioptas, A. Clauset, D. Kempe, and C. Moore, “On the bias of traceroute sampling: or, power-law degree distributions in regular graphs,” in STOC, 2005. (both in a different context)

12 Approach 3: Breaking the stub dependencies
number of nodes of degree k f

13 Approach 3: Breaking the stub dependencies
The analysis is exactly the same for other graph traversal techniques: BFS DFS Forest Fire Snowball Sampling Node sampling weighted by degrees

14 Theory vs Simulations degree distribution corrected!
Simulations on a power law random graph with 10K nodes

15 What if the graph is not random?
Random graph RG(pk): Purely random, given the degree distribution pk. Assortative RG(pk): Nodes of similar degree are more likely to connect.

16 Summary Graph traversals on RG(pk): Random Walk MHRW, RWRW
- real average node degree - real average squared node degree.

17 Summary Graph traversals on RG(pk): Random Walk MHRW, RWRW
- real average node degree - real average squared node degree.

18 Summary Thank you! Graph traversals on RG(pk): Random Walk MHRW, RWRW
For small sample size (for f→0), BFS has the same bias as RW. (also in our Facebook measurements) Random Walk Graph traversals on RG(pk): For large sample size (for f→1), BFS becomes unbiased. MHRW, RWRW This bias monotonically decreases with f. We found analytically the shape of this curve. - real average node degree - real average squared node degree. Thank you!


Download ppt "Maciej Kurant (EPFL / UCI) Joint work with: Athina Markopoulou (UCI),"

Similar presentations


Ads by Google