A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic Algorithm Jeff Naruchitparames University of Nevada, Reno - CSE CS 790: Complex Networks, Fall 2010
biological social 2
3
4
‣S‣S ocial networks = ‣D‣D ynamic, judgmental environment ‣A‣A ffect friendships over time 5 very dynamicheterogeneous
6
7 ‣1‣1 -2 hop distance only ‣F‣F riend-of-friend
‣M‣M ultiple hops; >1 ‣S‣S tructural; purely graph- based ‣N‣N o explicit correlation between potential friends... 8
‣ Silva, et. al., ‣ A Graph-based Recommendation System Using Genetic Algorithms,
10
11
Friends-of-Friends 2 hops Filter Order 12
Filtering “It’s more probable that you know a friend of your friend than any other random person” Mitchell M., Complex Systems: Network Thinking,
14
15
Indexes 16
‣H‣H eterogeneity ‣H‣H uman behavior and preferences ‣M‣M ultiple hops 17 What’s missing?
Pretty much a filtering problem My approach
‣C‣C omponents (for filtering) ‣B‣B etweenness centrality ‣C‣C ommunity detection ‣C‣C lique Percolation Method (CPM) ‣F‣F riends of friends ‣1‣1 0-dimensional Pareto-optimal genetic algorithm 19 My approach
Betweenness Centrality 20
Community Detection 21
‣R‣R emove duplicates ‣R‣R emove our test cases ‣(‣( More on this later...) 22
The Genetic Algorithm Part 23
Pareto Fronts 24
The Features 1. # of shared friends 2. location 3. age range 4. general interest 5. music 6. attended same events 7. groups 8. movies 9. education 10. religion/politics 25
Pareto Optimality ‣L‣L ocalized to implementation of selection ‣F‣F eature subset selection ‣W‣W e want to find the best combination of these subsets that can give us the best solutions for how we determine friendships 26
Pareto Optimality and Feature Subset Selection 27 F1F1F1F1 F2F2F2F2 F3F3F3F3 F4F4F4F4 F5F5F5F5 F6F6F6F6 F7F7F7F7 F8F8F8F8 F9F9F9F9 F 10 C1C1C1C C2C2C2C CnCnCnCn
A Point System 28 F1F1F1F1 F2F2F2F2 F3F3F3F3 F4F4F4F4 F5F5F5F5 F6F6F6F6 F7F7F7F7 F8F8F8F8 F9F9F9F9 F 10 U1U1U1U U2U2U2U UnUnUnUn
Pareto Optimality ‣C‣C ompare with the test cases we removed earlier... ‣F‣F or all chromosomes in population, do: ‣I‣I f ALL test cases ≥ optimal Pareto front ‣C‣C alculate fitness ‣G‣G ood to go ‣E‣E lse ‣C‣C alculate fitness ‣C‣C ontinue onto next chromosome 29
Fitness Function ∑ ∑ p i ln( f j ) p i-1 30 n10 i=1j=1
Continuing on with the Evolutionary Process ‣A‣A pply fitness proportional selection ‣R‣R andomly select 2 parents to mate ‣A‣A pply 1-point crossover (82% chance) ‣B‣B it mutation (0.05% chance) ‣D‣D o this until ALL test cases better than Pareto front OR fitness does not improve for 5 consecutive generations 31
1-Point Crossover 32
‣C‣C omplex network theory + Genetic algorithm + social theory ‣B‣B etweenness centrality ‣C‣C ommunity detection ‣C‣C lique Percolation Method ‣B‣B inary 10-dimensional Pareto-optimal genetic algorithm ‣D‣D ominant, fitness proportional selection ‣S‣S everal levels of filtering and selection (aka filtering ☺) 33 Conclusion
‣B‣B etter fitness function (need to ask Sociologists) ‣W‣W eighted chromosome for Pareto optimization (as opposed to binary) ‣P‣P rove all this stuff actually works (sociology standpoint??) ‣P‣P arallelize or GPU-ize the code (it’s in Python) 34 Future Work
35