1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
‘Small World’ Networks (An Introduction) Presenter : Vishal Asthana
Collective Dynamics of ‘Small World’ Networks C+ Elegans: Ilhan Savut, Spencer Telford, Melody Lim 29/10/13.
Milgram-Routing in Social Networks
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Analysis and Modeling of Social Networks Foudalis Ilias.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Small World Networks Lecture 5.
Advanced Topics in Data Mining Special focus: Social Networks.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Mining and Searching Massive Graphs (Networks)
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Advanced Topics in Data Mining Special focus: Social Networks.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
How is this going to make us 100K Applications of Graph Theory.
Computer Science 1 Web as a graph Anna Karpovsky.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Models of Influence in Online Social Networks
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Section 8 – Ec1818 Jeremy Barofsky March 31 st and April 1 st, 2010.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Complex Network Theory – An Introduction Niloy Ganguly.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Complex Network Theory – An Introduction Niloy Ganguly.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Lecture 23: Structure of Networks
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Lecture 1: Complex Networks
Topics In Social Computing (67810)
Lecture 23: Structure of Networks
Network Science: A Short Introduction i3 Workshop
The Watts-Strogatz model
Section 8.2: Shortest path and small world effect
Shortest path and small world effect
Peer-to-Peer and Social Networks Fall 2017
Lecture 23: Structure of Networks
Local Clustering Coefficient
Lecture 9: Network models CS 765: Complex Networks
Graph and Link Mining.
Other Random Graph Models
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri

2 Outline Objectives Objectives Data Used Data Used Small World Graphs Small World Graphs Predicting Friendship Predicting Friendship Results Results Future Works and Applications Future Works and Applications Conclusions Conclusions

3 Objectives To devise techniques to mine Internet in order to predict relationships between individuals To devise techniques to mine Internet in order to predict relationships between individuals To show that some pieces of information (e. g. terms on homepages) are better indicators of social connections than others To show that some pieces of information (e. g. terms on homepages) are better indicators of social connections than others

4 Information Side Effects By-products of data intended for one use which can be mined to understand tangential and larger scale phenomena By-products of data intended for one use which can be mined to understand tangential and larger scale phenomena Our case: to extract large social networks from individuals’ homepages Our case: to extract large social networks from individuals’ homepages

5 Data Used Text on user homepage (co- occurrence of text → common interest) Text on user homepage (co- occurrence of text → common interest) Out-links: from user homepage to other pages Out-links: from user homepage to other pages In-links: from other pages to user homepage In-links: from other pages to user homepage Mailing lists Mailing lists

6 Small World Phenomenon Real World Social Networks described by Small World Phenomenon Real World Social Networks described by Small World Phenomenon Stanley Milgram’s Experiment (“The Small World Problem”, 1967): Six Degrees of Separation Stanley Milgram’s Experiment (“The Small World Problem”, 1967): Six Degrees of Separation

7 Small World Phenomenon (cont’d) Adamic: World Wide Web is a Small World Graph (“The Small World Web”, 1999) Adamic: World Wide Web is a Small World Graph (“The Small World Web”, 1999) Our hypothesis : networks of personal homepages are Small World Graphs Our hypothesis (confirmed by Stanford and MIT personal homepages networks): networks of personal homepages are Small World Graphs

8 Stanford Graph

9 MIT Graph

10 Small World Graph Properties Watts & Strogatz (Collective Dynamics of small-world networks, 1999): Clustering Coefficient C is much larger than that of a Random Graph with same n° of vertices and avg n° of edges per vertex Clustering Coefficient C is much larger than that of a Random Graph with same n° of vertices and avg n° of edges per vertex Characteristic Path Length L is almost as small as L for the corresponding Random Graph Characteristic Path Length L is almost as small as L for the corresponding Random Graph

11 Clustering Coefficient (Watts & Strogatz, 1999) If a vertex v has k v neighbors then at most k v *(k v -1) directed edges can exist between them If a vertex v has k v neighbors then at most k v *(k v -1) directed edges can exist between them If C v denotes the fraction of these allowable edges that actually exists then C is the avg over all v If C v denotes the fraction of these allowable edges that actually exists then C is the avg over all v

12 Clustering Coefficient in Friendship Graphs C v : reflects the extent to which friends of v are also friends of each other C v : reflects the extent to which friends of v are also friends of each other C: measures the cliquishness of a typical friendship circle C: measures the cliquishness of a typical friendship circle

13 Predicting Friendship To predict if one person is a friend of another: we rank all users by their similarity to that person To predict if one person is a friend of another: we rank all users by their similarity to that person Hypothesis: friends are more similar to each other than others Hypothesis: friends are more similar to each other than others

14 Similarity Measurement Similarity measured analyzing text, links and mailing-lists Similarity measured analyzing text, links and mailing-lists To evaluate the likelihood that A is linked to B: we sum the n° of items the 2 users have in common To evaluate the likelihood that A is linked to B: we sum the n° of items the 2 users have in common Weighting Scheme: items unique to a few users are weighted more than common items Weighting Scheme: items unique to a few users are weighted more than common items

15 Friendship Prediction Algorithm’s Evaluation To evaluate the algorithm’s performance: To evaluate the algorithm’s performance: – we compute how many friends have a non-zero similarity score non-zero similarity score – we see what similarity rank the friends were assigned to were assigned to Problem: friends can appear have no items in common (little information about one of 2 users, users’ homepages used to express different interests) Problem: friends can appear have no items in common (little information about one of 2 users, users’ homepages used to express different interests)

16 Coverage and Predictive Ability of Data Sources Avg rank was computed for matches above a threshold such that all 4 data sources ranked an equal n° of users Avg rank was computed for matches above a threshold such that all 4 data sources ranked an equal n° of users

17 Have friends most in common than friends of friends?

18 Power-Law Few pages contain million of links but many pages have one or two Few pages contain million of links but many pages have one or two This diversity can express in a mathematical fashion This diversity can express in a mathematical fashion P(k)=Ck -t P(k)=Ck -t So probability of attaining a certain size k is proportional to 1/k to a power t (t greater than or equal to 1 and C=numerical constant) So probability of attaining a certain size k is proportional to 1/k to a power t (t greater than or equal to 1 and C=numerical constant)

19 Individual Item’s Predictive Ability Metric Used: Metric Used: ratio of the n° of linked users pairs associated with item divided by total n° of possible pairs Some Interesting Findings: Some Interesting Findings: –Shared items unique to a community are at the top, popular terms are at the bottom of MIT and Stanford lists –Different shared items at the top of Stanford and MIT lists (in MIT list, 5 of the top 10 terms are fraternities’ names) –In-link Stanford and MIT lists dominated by individual homepages –Bad predictive MIT and Stanford mailing lists are very general discussion lists, announcement lists and social activities lists

20 Individual Item’s Predictive Ability (cont’d)

21 Future Works New data sources: demographic information as address, year in school, major, … New data sources: demographic information as address, year in school, major, … To solve the problem that individuals interact with many people regularly, but do not link to all of them through web pages (possible solution: obtain social links directly from users) To solve the problem that individuals interact with many people regularly, but do not link to all of them through web pages (possible solution: obtain social links directly from users)

22 Applications To mine the correlations between groups of people (see: Pentland and Eagle works) To mine the correlations between groups of people (see: Pentland and Eagle works) To facilitate networking inside a community (see: LinkedIn) To facilitate networking inside a community (see: LinkedIn) Marketing research: to identify groups interested in a product, to rely on the Social Network to propagate information about some products Marketing research: to identify groups interested in a product, to rely on the Social Network to propagate information about some products

23 Conclusions Personal homepages provide a glimpse into the social structure of university communities Personal homepages provide a glimpse into the social structure of university communities Important: personal homepages reveal not only who knows to whom, but they give a context (e. g. shared hobbies, shared dorm) Important: personal homepages reveal not only who knows to whom, but they give a context (e. g. shared hobbies, shared dorm)

24 Thank You For Your Attention! Questions?