CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess.

Slides:



Advertisements
Similar presentations
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Advertisements

Introduction to arrays
Exercise (1).
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
NEURAL NETWORKS Perceptron
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching ANSHUL VARMA FAISAL QURESHI.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Finding Similar Music Artists for Recommendation Presented by :Abhay Goel, Prerak Trivedi.
People in Design Damian Gordon. People in Design Why do we care about people in design? – Because we build software systems for other people, so we have.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Spike Sorting Algorithm implemented on FPGA Elad Ilan Asaf Gal Sup: Alex Z.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Key Detection In Musical Signals Philip Brown, ’07 Advisor: Dr. Shane Cotter.
What is Cluster Analysis?
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.
Recommender systems Ram Akella November 26 th 2008.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Hub Queue Size Analyzer Implementing Neural Networks in practice.
Introduction - The Need for Data Structures Data structures organize data –This gives more efficient programs. More powerful computers encourage more complex.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Compare and contrast batch processing and online processing, outlining the meaning, advantages and disadvantages of the two. Which one would you recommend.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
MASTER THESIS num. 802 ANALYSIS OF ALGORITHMS FOR DETERMINING TRUST AMONG FRIENDS ON SOCIAL NETWORKS Mirjam Šitum Ao. Univ. Prof. Dr. Dieter Merkl Univ.
Chapter 1 Program Development Asserting Java © Rick Mercer.
Systems Development Lifecycle Testing and Documentation.
Algorithms CS139 – Aug 30, Problem Solving Your roommate, who is taking CS139, is in a panic. He is worried that he might lose his financial aid.
Problem Solving using the Science of Computing MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
THOMAS RANDOLPH KYLE SMITH STUART FELDT NICK PARKER What: Restaurant Management System. Why: Improve customer experience. Makes us better: Ours is personal.
Data Structures and Algorithms Introduction to Algorithms M. B. Fayek CUFE 2006.
Chapter 1 Program design Objectives To describe the steps in the program development process To introduce the current program design methodology To introduce.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
The basics of the programming process The development of programming languages to improve software development Programming languages that the average user.
DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM Submitted in partial fulfillment of requirement for the V Sem MCA Mini Project Under Visvesvaraya Technological.
CSI 1340 Introduction to Computer Science II Chapter 1 Software Engineering Principles.
An Introduction to Programming with C++ Sixth Edition
Computer Science 1 How do you store a bunch of similar stuff?
Higher Computing Science 2016 Prelim Revision. Topics to revise Computational Constructs parameter passing (value and reference, formal and actual) sub-programs/routines,
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Project 1 : Phase 1 22C:021 CS II Data Structures.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
1 Structured Programming Arab Academy for Science and Technology CC112 Dr. Sherif Mohamed Tawfik The Course.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
ALMAJMA'AH UNIVERSITY College of Science and Humanitarians Studies in Alghat Management Information System Section (211 NMA course) Introduction to Programming.
Irrefutable Accuracy. How to increase billings, stop short count credits & solve the ‘ghost waste’ mystery?
Unsupervised Learning
ICS 3UI - Introduction to Computer Science
Chapter 15 – Cluster Analysis
Applying Deep Neural Network to Enhance EMPI Searching
Privacy-Preserving Clustering
Social networking tools (powerpoint extract)
Week 12 Option 3: Database Design
Chapter 1 Program Development
Computer.
Consensus Partition Liang Zheng 5.21.
Global Challenge Fitness Friend Lesson 3.
Privacy Protection for Social Network Services
MECH 3550 : Simulation & Visualization
Global Challenge Fitness Friend Lesson 3.
EM Algorithm and its Applications
Global Challenge Fitness Friend Lesson 3.
Global Challenge Fitness Friend Lesson 3.
Global Challenge Fitness Friend Lesson 3.
Clustering.
Unsupervised Learning
Presentation transcript:

CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Outline Intro Problem Solution Implementation Distance Algorithm Clustering Algorithm Validation Test Data set Real Data set Demo

Introduction How Can We Group Friends? How can your friends be grouped logically? What are the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life? How We Define A Clique Desired Results High school friends, family, or co-workers will be grouped together as expected. Possibly form cliques or groups of people within your friend’s list that may not have been considered before.

Implementation Gather Data Distance Algorithm Clustering Algorithm Input: Distance Matrix Output: Two dimensional array of friends Test app Output

Distance Algorithm Problems Facebook limits Server limits Retrieving and processing over 30,000 photos can take up to 3-6 minutes Important information What information should be processed? Used photo tags and wall counts Data collected Average of 8,000 photos across all friends

Distance Algorithm (continued) Survey of 50 users 5 useful pieces of information personal information, wall post, photos, groups, and events

Distance Algorithm (continued) Facebook results One picture with 5 tags = 5 results Process results Turn into a list of friends with tagged photos Find a distance between each friend Turn into a distance matrix Run time – worse case (number of users)^2*(number of photos)^2

Improved Distance Equation Distance Percentage of tagged photos where users appear together

Clustering Algorithm Hierarchical Clustering Average Linkage Clusters Generalized to work on any objects with a distance function Clustering stops when the closest two clusters are > threshold distance apart

Point-Based Test Driver

Validation – Sample Data Set

How we measured correctness Thresholds 3-10 gave us the correct number of cliques however, 5 was placed incorrectly Error rate of 10% because 1/10 users was misplaced Choose the mid-point value of 6 for our threshold

Validation – Real Data Set We chose to use Thomas Dvornik's account – Moderate amount of data – His friends could be separated into well-defined cliques Threshold on real data Threshold gave highest accuracy at 3 and second highest at 6

Validation – Improvements After improvements Again, based on our accuracy measurement

Improvements/Future Work Caching – The number of queries and computation can get very large – Store the distance matrix for 24 hours Accuracy – Use all aspects of Facebook Some activity is not even considered – Using weights for different data sources Not all activity is equally important – Analysis of produced cliques Survey to see if cliques are accurate

Demo

Questions?