Download presentation
Presentation is loading. Please wait.
Published byNelson Black Modified over 6 years ago
1
Towards Maximum Independent Sets on Massive Graphs
Yu Liu, Jiaheng Lu, Hua Yang, Xiaokui Xiao, Zhewei Wei 1.Hello everyone, it’s my pleasure to present our work. My name is Yu Liu and I’m a PhD student from Renmin University of China. This is a joint work with Jiaheng, Hua, Xiaokui and Zhewei. It’s about towards maximum independent sets on massive graphs. 2.Here’s the outline of my talk.
2
Towards Maximum Independent Sets on Massive Graphs
Outline Background The Maximum Independent Set Problem Previous Algorithms Computation Models Our Semi-external Algorithms Greedy Algorithm One-K-Swap Algorithm Two-K-Swap Algorithm Experiments Conclusions 1.First I’ll give the background of this talk, including the maximum independent set problem, previous internal and external memory algorithms computing independent set and computation models used in our paper; 2.Then 3 semi-external algorithms will be introduced, the greedy algorithm and two swap-based algorithms, followed by experimental evaluation, and finally our conclusion. Towards Maximum Independent Sets on Massive Graphs
3
Background: The Maximum Independent Set Problem
Definitions Independent set: Given G=<V,E>, a subset(IS) of V s.t. for any u, v in IS, (u, v) is not in E 1.The first question is, what is an independent set? Given graph G, it’s a subset of the vertex set s.t. any 2 vertices in this set have NO edge between in G. A related notion is the maximal independent set shown in left figure, which means adding any more vertex into the set will make it not independent; the maximum independent set is the one with largest set size. 2.Punch line: These are the 2 types of independent set we care about. Maximal Independent Set Maximum Independent Set Towards Maximum Independent Sets on Massive Graphs
4
Background: The Maximum Independent Set Problem
Hardness of Computing Maximum Independent Set NP-hard [Halldórsson et al STOC94][Robson J.Algorithms 86] APX-hard [Hastad FOCS96] Why computing (near-)optimal independent set? Interesting problem of graph theory Real-world applications favor larger independent sets, such as shortest path and distance queries [Fu et al PVLDB13] [Jiang et al PVLDB14] and automated labeling of maps[Strijk et al 00] 1.Computing maximum independent set has been proved NP-hard. Moreover, it’s approximable hard, which means there’s NO polynomial algorithm having constant approximation ratio. 2.But in practice, we DO care about a near-optimal independent set, for it’s a interesting problem of graph theory, and many real-world applications favor a large independent set, such as labeling-based shortest path and distance query, and map labeling problem. 3.Let’s first see if the existing algorithms work well on massive graphs. Towards Maximum Independent Sets on Massive Graphs
5
Background: Previous Internal and External Memory Algorithms
Internal Memory Algorithms Exact algorithms [Robson J.Algorithms 86][Xiao et al ISAAC13] Approximation algorithms Greedy[Halldórsson et al STOC94] [Feige J.Discrete Math04] Various heuristic algorithms! External Memory Algorithms Randomized algorithm[Abello et al Algorithmica02] [Zeh02] finds maximal independent set 1.There’re a bunch of existing works on independent set, including exact algorithms, approximation algorithms such as the greedy algorithm, and various heuristic algorithms. They are all in-memory algorithms. 2.There’re only few works on external memory, such as Zeh’s work, it finds a maximal independent set but can be very worse than optimal. Towards Maximum Independent Sets on Massive Graphs
6
Background: Previous Internal and External Memory Algorithms
Methods CPU or I/O bound Approx. ratio Xiao et al ISAAC13 CPU: O(1.2002|V|*|V|O(1)) exact Feige J.Discrete Math04 CPU: polynomial O(n(loglogn)2/(logn)3) Greedy[Halldórsson et al STOC94] CPU: O(|V|log|V|+|E|) (Δ+1)/3 (2d+3)/5 Zeh02 I/O: O(sort(|V|+|E|)) no bound 1.Here’s a comparison of the state of art algorithms for independent set problem. The first algorithm has minimum time complexity to date finding the maximum independent set, and the second has best approximation ratio among polynomial algorithms. They are both impractical on large graphs. The in-memory greedy algorithm has acceptable CPU time complexity and approximation ratio, and perform well in practice. We’ll see later why it cannot be extended to external setting. The last one is a external memory algorithm, but has NO acceptable theoretical gurantee. 2.Punch line: So previous algorithms either do not scale well, or have NO theoretical guarantee, they are all unacceptable for massive graphs. So, it’s a really hard problem? Towards Maximum Independent Sets on Massive Graphs
7
Background: Problem on Real-world Graphs
However, real-world graphs exhibit some properties, e.g., power law degree distribution which makes the work easier. Towards Maximum Independent Sets on Massive Graphs
8
Background: Problem on Real-world Graphs
However, real-world graphs exhibit some properties, e.g., power law degree distribution which makes the work easier. But even for power law graph, finding maximum independent set is still APX-hard[Shen et al COCOON07][Shen et al COCOA10] 1…which means there exist some worst case graph instances. Towards Maximum Independent Sets on Massive Graphs
9
Background: Problem on Real-world Graphs
However, real-world graphs exhibit some properties, e.g., power law degree distribution which makes the work easier. But even for power law graph, finding maximum independent set is still APX-hard[Shen et al COCOON07][Shen et al COCOA10] But for real-world graphs still it’s hopeful to use these properties to compute a near-optimal independent set! 1.Since the randomness and some properties of the real-world graph prohibit the worst case instance in most cases. Towards Maximum Independent Sets on Massive Graphs
10
Background: Models for Massive Graphs
Most real-world graphs have distribution obey power law Power Law Random Graph Model Many PLRG models, e.g., BA model, recursive models 1.The number of vertices with degree k is in inverse proportion to k to the power of beta. If we plot the degree vs. number of vertices in log-log scale, it would be a straight line. 2.I’ll briefly introduce the model we used. Towards Maximum Independent Sets on Massive Graphs
11
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] 2 parameters: α and β 1.The model has 2 parameters, alpha and beta, where alpha controls the graph size and beta controls the skewness of degree distribution. Towards Maximum Independent Sets on Massive Graphs
12
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] 2 parameters: α and β Generation … deg=1 deg=2 deg=3 1.To generate a graph by ACL model given alpha and beta, we first compute the number of vertices of degree i, for i from 1 to maximum degree. Towards Maximum Independent Sets on Massive Graphs
13
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] 2 parameters: α and β Generation … deg=1 deg=2 deg=3 copies 1.Then for a vertex of degree i, generate i copies of it. Towards Maximum Independent Sets on Massive Graphs
14
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] 2 parameters: α and β Generation … deg=1 deg=2 deg=3 copies 1.Then do random matching between copies. Towards Maximum Independent Sets on Massive Graphs
15
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] 2 parameters: α and β Generation … deg=1 deg=2 deg=3 copies 1.At last, merge copies into vertices, 2 vertices have edge if their copies have. Towards Maximum Independent Sets on Massive Graphs
16
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] 2 parameters: α and β Generation Advantages: flexible parameters, efficient generation, easy to analyze… 1.The ACL model has many advantages comparing to other models. 2.Under power law random graph model, the independent set problem has better performance ratio comparing to arbitrary graph. Towards Maximum Independent Sets on Massive Graphs
17
Background: Models for Massive Graphs
Power Law Random Graph Model ACL model[Aiello et al STOC00] Independent set on PLRG Arbitrary Graph Real-world Power Law Graph |V| optimal efficient polynomial trivial maximal independent set |V| optimal best polynomial trivial maximal independent set 1.For arbitrary graph, the best polynomial and trivial maximal independent set may have performance arbitrary worse than optimal. For power law graph, there still exists a significant gap between the trivial mis and optimal independent set, but narrower than arbitrary graph, and we can design efficient polynomial algorithms very close to optimal. 2.However, real-world graphs may go very large. Wwhen the graph is too large to fit into memory, we need external memory computation model. Towards Maximum Independent Sets on Massive Graphs
18
Background: External Computation Model
External memory model Memory budget: M<<|G| (Semi-)External memory model Memory budget: c|V|<=M << |G| Disk Memory 1.In external computation model, data transfer between memory and disk is by block, and random access is I/O-expensive. So we use sequential scan and external memory sort instead. 2.For graph problems, the external memory model requires memory usage much smaller than graph size, whereas semi-external model permit memory consumption being a small constant times the size of vertex set. 3.And our goal is… Towards Maximum Independent Sets on Massive Graphs
19
Background: Our Goals(Problem Definition)
Semi-external memory algorithms for independent set memory budget: c|V|<=M << |G|, c=2 or 3 Low CPU time and I/O complexity (especially in practice!) Find near-optimal independent set Have non-trivial theoretical bounds 1.We compute independent set in semi-external settings, with constant being sufficiently small, i.e., 2 or 3. 2.The algorithm should have low CPU time and I/O complexity, find near-optimal independent set and have non-trivial theoretical bounds. 3.So I’ll begin talking our algorithms. Towards Maximum Independent Sets on Massive Graphs
20
Our Semi-external Algorithm: Greedy
Greedy[Halldórsson et al STOC94] I/O complexity(worst case): O(|V|) random accesses ! 1.First we look at the in-memory greedy algorithm because it’s a very efficient and effective algorithm in practical, we’ll see why it cannot be extended to external memory. 2.The reason is, when a vertex is selected, we remove it and its neighbors from the graph, and update degree of neighbors of these vertices. So next time the vertex of smallest degree can be anywhere in the file no matter how the file is sorted. So random accesses are inevitable. In worst case, there can be O(|V|) random accesses, that’s impractical for external algorithms. Towards Maximum Independent Sets on Massive Graphs
21
Our Semi-external Algorithm: Greedy
Greedy[Halldórsson et al STOC94] I/O complexity(worst case): O(|V|) random accesses ! Our Greedy Algorithm 1. Sort adjacency list file by vertex degree; 2. State(v)<-IS for all v in G 3. Scan the file once, for each v, if State(v)=IS mark State(u)=NIS for all u in adj(v) 1.Then I’ll give our semi-external greedy algorithm. The difference is we do not update degree of any vertex, but only sort adjacency lists of vertices by degree in advance. 2.Clearly we can see all vertices with State equal to IS form an independent set. Towards Maximum Independent Sets on Massive Graphs
22
Our Semi-external Algorithm: Greedy
Greedy[Halldórsson et al STOC94] I/O complexity(worst case): O(|V|) random accesses ! Our Greedy Algorithm 1. Sort adjacency list file by vertex degree; 2. State(v)<-IS for all v in G 3. Scan the file once, for each v, if State(v)=IS mark State(u)=NIS for all u in adj(v) Complexity Analysis I/O complexity sort(|V|+|E|)=(|V|+|E|)/BlogM/B(|V|+|E|)/B 1.The I/O complexity of greedy algorithm is dominated by the sorting. And since we only sort |V| adjacency lists of different length, the complexity can be improved to the following. 2.Since each vertex only uses 1 bit to distinguish IS or NIS, the algorithm use |V| bits, or |V|/L memory units given each memory unit has length L. (|V|+|E|)/B (logM/B|V|/B+2) Towards Maximum Independent Sets on Massive Graphs
23
Our Semi-external Algorithm: Greedy
Performance Analysis Expected independent set size(ACL model) Performance ratio For most real-world graphs tested, independent set is quite close to optimal Synthetic graph is generated by ACL model[Aiello et al STOC00], and upper bound is by [Goldberg et al WEA05] Performance Ratio=Lowerbound(Greedy)/Upperbound(G) 1.We managed to compute an expected independent set size by semi-external greedy algorithm, see this formula. 2.Basically, the idea is we compute the probability of a vertex with degree I links all its edges to higher degree vertices. So this vertex is in independent set by greedy. Then we get the expected independent set size. In most cases it’s a lower bound. 3.This table shows the expected performance ratio of greedy algorithm. It’s computed by the lower bound over an upper bound of maximum independent set, using a modified version of the algorithm in this. Later we’ll give the experimental evaluation. 4.However, real-world graphs may not perfectly described by ACL model, so we design 2 swap-based algorithms to improve the performance of greedy. β 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 ratio 0.981 0.979 0.978 0.972 0.971 0.973 0.974 0.977 Towards Maximum Independent Sets on Massive Graphs
24
Our Semi-external Algorithm: One-K-Swap
Intuition(inspired by [Khanna et al J.Comput.98]) If a NIS vertex has only 1 adjacent IS vertex(1-k-swap candidate); and the IS vertex has >=2 independent 1-k-swap candidates v a b 1.The first algorithm is called One-K-Swap, and it’s inspired by this work. Intuitively, if … 2.We call v the IS neighbor of a and b. Towards Maximum Independent Sets on Massive Graphs
25
Our Semi-external Algorithm: One-K-Swap
Intuition(inspired by [Khanna et al J.Comput.98]) If a NIS vertex has only 1 adjacent IS vertex(1-k-swap candidate); and the IS vertex has >=2 independent 1-k-swap candidates Algorithm while(canSwap) //another round One-K-Swap scan the file to get ISN; //1st scan scan the file and do 1-k-swaps; //2nd scan scan the file and do 0-1-swaps. //3rd scan Actually 1st scan and 3rd scan can be merged v v b b a a 1.The algorithm needs rounds of iteration and in each round we need 3 sequential scans. … 2.Actually 1st scan and 3rd scan can be merged. This is because the maximum degree of an IS vertex should be much smaller than the block size, so its adjacency list can be put into main memory and easily read again. 3.However, to make the idea of One-K-Swap work in external setting, we design data structure like this: Towards Maximum Independent Sets on Massive Graphs
26
Our Semi-external Algorithm: One-K-Swap
v a b Implementation Details Data structure Memory cost |V|log|V| bits -> |V| memory units 1 a v b |V| ISN v … 2 -1 1.We store an ISN array in memory of length |V|. For 1-k-swap candidate, it stores the IS neighbor; for IS vertex it stores number of adjacency candidates. So the memory cost is |V| units. Towards Maximum Independent Sets on Massive Graphs
27
Our Semi-external Algorithm: One-K-Swap
v a b Implementation Details Data structure Memory cost |V|log|V| bits -> |V| memory units Q1: How to know 2 candidates are independent? Q2: How to resolve “conflicts” between different candidates? Q3: Is there a case, 1-k-swaps exist but conflicted?(“deadlock”) 1 a v b |V| ISN v … 2 -1 1.We also have to solve some questions, like Q1, Q2 and Q3. 2.I’ll take Question 1 as example. Once reading a 1-k-swap candidate, we can find how many candidates having same IS neighbors in its adjacency list. If this value is smaller than the number stored by its IS neighbor, it means that at least 1 other candidate is independent to this vertex. So if the vertex has no adjacent IS vertex except its ISN, it can do 1-k-swap; or else we should update its IS neigbor’s info(ISN decrease by 1) 3.Then we see the performance of One-K-Swap. 1 4 2 3 1 4 2 3 “deadlock” 1 4 2 3 Towards Maximum Independent Sets on Massive Graphs
28
Our Semi-external Algorithm: One-K-Swap
Correctness and Completeness Cascading swaps? Complexity Analysis I/O complexity: if alg. have k rounds, 2k sequential scans; In practice, k<3 is sufficient. Scan(|V|+|E|) Time complexity: each op can be done O(1); 2k(|V|+|E|) -> O(|V|+|E|) In practice really fast! 1.We may care about how many rounds it will take since this affects the complexity of the algorithm. Theoretically, the can be cascading swaps, means… 2.But in practice, as we discussed before, power law and other real-world graph properties almost prevent this from happening. Usually 2 rounds of iteration leave the un-done 1-k-swaps less than 2-3%, so the algorithm can stop to get a trade-off between efficiency and effectiveness. 3.So the I/O complexity is… Towards Maximum Independent Sets on Massive Graphs
29
Our Semi-external Algorithm: One-K-Swap
Correctness and Completeness Cascading swaps? Complexity Analysis I/O complexity: if alg. have k rounds, 2k sequential scans; In practice, k<3 is sufficient. Scan(|V|+|E|) Time complexity: each op can be done O(1); 2k(|V|+|E|) -> O(|V|+|E|) In practice really fast! Performance Analysis 1.We only give some intuition on how large the independent set can be increased by One-K-Swap. This formula shows the expected increments of 1st round after greedy algorithm. We analyze the degree relationship between 1-k-swap candidates and there is neighbors, and prove they are both very small w.h.p. 2.For the One-K-Swap based on result of random algorithm, it can have significant amount of swaps. The intuition is that, if unfortunately we include some vertex of large degree into independent set, then it’s very possible it has 2 independent 1-k-swap candidates, so a swap is performed. The final result is similar to a greedy algorithm’s. Towards Maximum Independent Sets on Massive Graphs
30
Our Semi-external Algorithm: Two-K-Swap
Intuition As done in One-K-Swap, find all swap patterns 1.Due to the time limit, I’ll just give some high-level ideas of Two-K-Swap and don’t go into details. 2.Firstly as in One-K-Swap, we find all swap skeletons, or patterns of minimal subgraph can do 2-k-swap. They have the following 3 types. Towards Maximum Independent Sets on Massive Graphs
31
Our Semi-external Algorithm: Two-K-Swap
b c Intuition Two difficulties need to be handled in 2-k-swap Q1: How to find 3 independent candidates by sequential scan? A: 1)Labeling: label(b), label(c) contains a 2)a, b and c should be stored together(additional storage?) Q2: How to avoid conflicts between 2-k-swap candidates? A: Store the “conflict graph” in memory. 2.Then we need to solve some difficulties in designing a semi-external swap algorithm, such as: Towards Maximum Independent Sets on Massive Graphs
32
Our Semi-external Algorithm: Two-K-Swap
b c Intuition Two difficulties need to be handled in 2-k-swap Q1: How to find 3 independent candidates by sequential scan? A: 1)Labeling: label(b), label(c) contains a 2)a, b and c should be stored together(additional storage?) Q2: How to avoid conflicts between 2-k-swap candidates? A: Store the “conflict graph” in memory. Implementation Details Data Structure Memory Cost 2|V| memory units for ISN & <|V| memory units for labels and conflict graph 3.And the data structure is a little more complex than One-K-Swap, and needs more spaces. 4.We store 1-k-swap and 2-k-swap candidates in ISN, labels and conflict graph as hash map with limited size, and conflict graph is elastic. Towards Maximum Independent Sets on Massive Graphs
33
Our Semi-external Algorithm: Two-K-Swap
Complexity Analysis I/O complexity: if alg. have k rounds, 3k sequential scans; In practice, k<3 is sufficient. Scan(|V|+|E|) Time complexity: each round is O(|V|+|E|) 2kO(|V|+|E|) -> O(|V|+|E|) Performance Analysis(Intuition) Cover all 1-k-swaps In expectation (and in practice), better than One-K-Swap 1.Finally I’ll show the performance analysis. Like One-K-Swap, in practice, 2 rounds of iteration leave the un-done 1-k-swaps and 2-k-swaps less than 2-3%. So the I/O complexity can be seen as… 2.Since Two-K-Swap detect all possible 1-k-swaps, in expectation,… Towards Maximum Independent Sets on Massive Graphs
34
Experiments: On Synthetic Graphs
Power law random graphs generated by ACL model[Aiello et al STOC00] 1.Experiments are conducted to evaluate our algorithms. First we generate synthetic power law graphs by ACL model, and vary beta, to see how our 3 algorithms perform. 2.This figure shows they are all near-perfect, and the approximation ratio is computed as dividing the independent set size of various algorithms by the upper bound we mentioned before. 3.Since Greedy is already very good, swap algorithms have only a little improvements. 4.The table shows the expected performance ratio(we showed it before), we can see when conducting sufficient many tests it’s a lower bound, and the bound is quite tight. 5.Actually the independent set computed by various algorithms on power law graphs are very concentrated at some expected value, for this experiment we never see a time the result falls under the estimated value. β 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 ratio 0.981 0.979 0.978 0.972 0.971 0.973 0.974 0.977 Estimated performance ratio lower bound Towards Maximum Independent Sets on Massive Graphs
35
Experiments: Datasets
All graphs have degree distribution following power law 2.Further it needs to test our algorithms on real-world datasets. These are the datasets we used, varied in size and average degree. 3.For example, Youtube is the social network of Youtube users and their connections; Patent is maintained by the National Bureau of Economic Research. This citation graph includes all citations made by patents granted between 1975 and Twitter10 contains a social network crawled from Twitter.com in July ClueWeb12 is the web graph crawled in 2012. Towards Maximum Independent Sets on Massive Graphs
36
Experiments: Various algorithms on Real-world graphs
Dataset Time Greedy[94] Zeh[02] SemiGreedy One-K-Swap Two-K-Swap Astroph 129ms 73.6ms 57ms 347ms 237ms DBLP 0.75s 1.40s 0.56s 1.36s 1.39s Youtube 1.93s 2.67s 1.15s 3.78s 4.76s Patent 21.3s 22.0s 4.6s 27.8s 36.7s Blog 28.8s 30.0s 6.2s 35.7s 45.3s Citeseerx 16.0s 6.4s 25.7s 20.8s Uniport 18.6s 20.9s 2.2s 19.9s 18.5s Facebook N/A 187.2s 47.9s 153.0s 160.8s Twitter 18min 8min 39min 55min Clueweb12 1.95h 1.65h 8.8h 10.4h 1.First we must prove our algorithms are practical, i.e., have acceptable time and memory cost. 2.For time cost, we can see the semi-greedy is fast, and 2 swap algorithms also have acceptable time cost. For large graphs, in fact this time is determined by how fast the disk speed is. 3.As a comparison, we can see the in-memory greedy algorithm is fast only when graphs can be fitted into memory, if not it can not finish in tens of hours due to large amount of random accesses. The existing external memory algorithm is also fast on all datasets, in compromise with the size of independent set. Towards Maximum Independent Sets on Massive Graphs
37
Experiments: Various algorithms on Real-world graphs
Dataset Memory Cost Greedy[94] Zeh[02] SemiGreedy One-K-Swap Two-K-Swap Astroph 4.43MB 25KB 4.5KB 149.1KB 329.7KB DBLP 128.3MB 0.25MB 51.9KB 1.65MB 3.55MB Youtube 239.1MB 1MB 141.6KB 4.59MB 9.69MB Patent 692.2MB 2MB 460.2MB 14.9MB 31.7MB Blog 841.9MB 493.2KB 15.9MB 34.4MB Citeseerx 1258.4MB 798.3KB 25.7MB 52.4MB Uniport 1242.7MB 850.8KB 27.5MB 55.4MB Facebook N/A 25MB 7.06MB 234.2MB 468.9MB Twitter 7.34MB 242.2MB 524.1MB Clueweb12 200MB 116.6MB 3.75GB 5.73GB 1.The memory cost has similar conclusions. See our greedy only use very limited memory, and memory cost of 2 swap-based algorithms are acceptable considering the whole graph size, 2.The in-memory greedy cost a lot of memory because it needs to read the whole graph, and implementation of data structures like Fibonacci heap has extra cost. 3.The previous external algorithm only use very small memory and it can be set. We set it in proportion to the graph size. Towards Maximum Independent Sets on Massive Graphs
38
Experiments: Various algorithms on Real-world graphs
Dataset Greedy[94] Zeh[02] + One-K-Swap + Two-K-Swap Semi-Greedy S-Greedy Astroph 17110 15275 16625 16814 15019 16054 16572 DBLP 260992 242521 260715 260961 260886 261003 261007 Youtube 880873 823821 879078 880455 878459 880642 880835 Patent Blog Citeseerx Uniprot Facebook N/A Twitter ClueWeb12 1.Most importantly, we care about how large a independent set these algorithms can get. Numbers in bold are the best, others shows their first significant bits different from the best. 2.In-memory greedy is always good on small graphs; but cannot work on large graphs; 3.Two-K-Swap after greedy is also nearly the best on all datasets. All One-K-Swap and Two-K-Swap algorithms have results close to the best in comparison. 4.Our greedy is only a little worse than in-memory greedy 4.In almost all datasets, random algorithm has a gap with other algorithms, sometimes can be very worse. Towards Maximum Independent Sets on Massive Graphs
39
Experiments: Various algorithms on Real-world graphs
Dataset Greedy[94] Zeh[02] + One-K-Swap + Two-K-Swap Semi-Greedy S-Greedy Astroph 17110 15275 16625 16814 15019 16054 16572 DBLP 260992 242521 260715 260961 260886 261003 261007 Youtube 880873 823821 879078 880455 878459 880642 880835 Patent Blog Citeseerx Uniprot Facebook N/A Twitter ClueWeb12 OPT =19106 =261008 =880882 < < < < < < < 1.We also compute the exact maximum independent set or its upperbound.(red-independence number; green-upperbound) 2.Many datasets, our algorithms have results very close to optimal or upper bound. 3.For some large graphs cannot get the exact optimal value, and with large average degree, the upper bound may not be tight.(Concluded from smaller graphs) e.g., Astroph 24,707 DBLP 317,629 -> 262,087 Youtube 889,375 Towards Maximum Independent Sets on Massive Graphs
40
Towards Maximum Independent Sets on Massive Graphs
Experiments: Summary Though Greedy[Halldórsson et al STOC94] also gives near-optimal results for most power law graphs, it cannot scale well on large graphs; and [Zeh02]’s result is much worse Our greedy algorithm has high efficiency without losing much effectiveness One-K-Swap and Two-K-Swap improve independent set size to near-optimal, with limited memory cost and acceptable time cost Our algorithms outperform previous external algorithms, both in theory and in practice Towards Maximum Independent Sets on Massive Graphs
41
Towards Maximum Independent Sets on Massive Graphs
Conclusions We develop three semi-external algorithms to find near-optimal independent set on massive graphs, all satisfying Low memory cost Low time and I/O complexity Near-optimal in theory and in practice Easy to implement We give non-trivial theoretical guarantees for our proposed algorithms, which proves to be near-optimal Experiments show that our algorithms have better performance and bounds than existing external algorithms Towards Maximum Independent Sets on Massive Graphs
42
Towards Maximum Independent Sets on Massive Graphs
Thank you! Q & A Towards Maximum Independent Sets on Massive Graphs
43
Real-world dataset downloading
Our preprocessed datasets are available at Source datasets url: Towards Maximum Independent Sets on Massive Graphs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.