A Graph-based Friend Recommendation System Using Genetic Algorithm Sojharo Computational Intelligence
Introduction Friend Recommendation System based on topology of network graphs Oro-Aro, a local social Network was used in experiment Algorithm to analyze the sub-graph Of a user A and all others connected with user A Separated by tree (often two) degree of separation Using patterns to find users with similar behavior as of user A Based on analysis of user A’s friends network and Friends Of Friends (FOF) Some of these SNSs already provide a service to recommend friends, even though the method used is not disclosed, we believe that an FOF approach is mostly used.
Why Recommendation Systems? Rise of E-Commerce Successful recommendations increase sell E.g. people who bought ‘English Grammar’ also bought ‘Everyday English’ Based on previous knowledge Product, service, friend recommendation Growing in both commercial and academic research interest
The Oro-Aro Social Network Total 634 nodes (Users) 5076 edges Preprocessing was applied on data Used filter to remove all one-way relationships Reduced by 29% number of edge A social network is an organization composed of nodes that are connected through one or more particular kind of interdependence, like values, ideas, interests, business, friendships, kinship, conflict, and trading [4], [5].
Recommendation Mechanism Topological Characteristics and the metrics are derived from the complex network theory Strategy is to: Filter and order the set of nodes that have some relation to give node vi The resulting node set has nodes which are recommendations for node vi Recommendation process is divided into two steps Filtering Procedure Ordering Procedure
Filtering vs Ordering Filtering separates the nodes with higher probabilities to be a recommendation Reducing the number of nodes to be processed Ordering put the most relevant nodes in top of the list Some properties and metrics are used Genetic Algorithm is used here
Degree of Separation vs frequency of occurence
Filtering Procedure Uses the concept clustering coefficient It is more probable that you know a friend of your friend than any other random person Restricted to select nodes adjacent to each node that is adjacent to central node (vi in our case) All nodes that can be reached in two hops are considered
Ordering Procedure Ordering mechanism uses One numeric value related to each node to be ordered This indexing value is a result of a process that measures Interaction strength between that node and central node (node vi) The measurement of this interaction is result of A weighted average among three independent indexes These indexes measure specific properties of a sub-graph composed by the nodes that are analyzed
First Index: Common Friends Defined as number of adjacent nodes that are linked at the same time to node i and node j i is center node (our node vi) and j is the node being ordered
Second Index: Density of the result of first index Measures the cohesion level inside the group formed by common friends of person i and person j If the value is small, then people inside this group are not well-related
Third Index: Variation of Second Index Measures the density of the group formed by the adjacent vertices of node i and node j Instead of Intersection, it takes Union
Third Index: Continued… Measures the cohesion between the ‘big’ set formed by friends of i and friends of j Example: Work Environment School Our friends in same big set may not be our common friends
Calibration Step Procedure to combine multiple indexes (three) into single value This value is used to obtain final set of ordered results for the recommendation system Procedure to obtain this value is to use weighted average among indexes Weight calibration of each single index must be adjusted to get optimized result optimization means classifying the most important users in the beginning of the list.
Fitness Function Importance of suggested friend depends on the user Optimization function must consider existing relationship of user Modification is proposed in filtering process Also include the nodes directly connected to central node (node vi) Fitness function uses classification of these nodes as a measure rightness Since the ordering procedure defines positions to each node, the mean positions of these nodes that are already related to the central node, is the fitness function value. The smaller this value is the better is the weighting set being considered.
Fitness Function Calibration Step is our optimization problem Ii represents the index wi represents the weights given to each index We need to optimize these weights