Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess.

Similar presentations


Presentation on theme: "CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess."— Presentation transcript:

1 CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

2 Outline Intro Problem Solution Implementation Distance Algorithm Clustering Algorithm Validation Test Data set Real Data set Demo

3 Introduction How Can We Group Friends? How can your friends be grouped logically? What are the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life? How We Define A Clique Desired Results High school friends, family, or co-workers will be grouped together as expected. Possibly form cliques or groups of people within your friend’s list that may not have been considered before.

4 Implementation Gather Data Distance Algorithm Clustering Algorithm Input: Distance Matrix Output: Two dimensional array of friends Test app Output

5 Distance Algorithm Problems Facebook limits Server limits Retrieving and processing over 30,000 photos can take up to 3-6 minutes Important information What information should be processed? Used photo tags and wall counts Data collected Average of 8,000 photos across all friends

6 Distance Algorithm (continued) Survey of 50 users 5 useful pieces of information personal information, wall post, photos, groups, and events

7 Distance Algorithm (continued) Facebook results One picture with 5 tags = 5 results Process results Turn into a list of friends with tagged photos Find a distance between each friend Turn into a distance matrix Run time – worse case (number of users)^2*(number of photos)^2

8 Improved Distance Equation Distance Percentage of tagged photos where users appear together

9 Clustering Algorithm Hierarchical Clustering Average Linkage Clusters Generalized to work on any objects with a distance function Clustering stops when the closest two clusters are > threshold distance apart

10 Point-Based Test Driver

11 Validation – Sample Data Set

12 How we measured correctness Thresholds 3-10 gave us the correct number of cliques however, 5 was placed incorrectly Error rate of 10% because 1/10 users was misplaced Choose the mid-point value of 6 for our threshold

13 Validation – Real Data Set We chose to use Thomas Dvornik's account – Moderate amount of data – His friends could be separated into well-defined cliques Threshold on real data Threshold gave highest accuracy at 3 and second highest at 6

14 Validation – Improvements After improvements Again, based on our accuracy measurement

15 Improvements/Future Work Caching – The number of queries and computation can get very large – Store the distance matrix for 24 hours Accuracy – Use all aspects of Facebook Some activity is not even considered – Using weights for different data sources Not all activity is equally important – Analysis of produced cliques Survey to see if cliques are accurate

16 Demo http://apps.facebook.com/mine_cliques/

17 Questions?


Download ppt "CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess."

Similar presentations


Ads by Google