Download presentation
Presentation is loading. Please wait.
Published byLaura Spencer Modified over 8 years ago
1
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri
2
2 Outline Objectives Objectives Data Used Data Used Small World Graphs Small World Graphs Predicting Friendship Predicting Friendship Results Results Future Works and Applications Future Works and Applications Conclusions Conclusions
3
3 Objectives To devise techniques to mine Internet in order to predict relationships between individuals To devise techniques to mine Internet in order to predict relationships between individuals To show that some pieces of information (e. g. terms on homepages) are better indicators of social connections than others To show that some pieces of information (e. g. terms on homepages) are better indicators of social connections than others
4
4 Information Side Effects By-products of data intended for one use which can be mined to understand tangential and larger scale phenomena By-products of data intended for one use which can be mined to understand tangential and larger scale phenomena Our case: to extract large social networks from individuals’ homepages Our case: to extract large social networks from individuals’ homepages
5
5 Data Used Text on user homepage (co- occurrence of text → common interest) Text on user homepage (co- occurrence of text → common interest) Out-links: from user homepage to other pages Out-links: from user homepage to other pages In-links: from other pages to user homepage In-links: from other pages to user homepage Mailing lists Mailing lists
6
6 Small World Phenomenon Real World Social Networks described by Small World Phenomenon Real World Social Networks described by Small World Phenomenon Stanley Milgram’s Experiment (“The Small World Problem”, 1967): Six Degrees of Separation Stanley Milgram’s Experiment (“The Small World Problem”, 1967): Six Degrees of Separation
7
7 Small World Phenomenon (cont’d) Adamic: World Wide Web is a Small World Graph (“The Small World Web”, 1999) Adamic: World Wide Web is a Small World Graph (“The Small World Web”, 1999) Our hypothesis : networks of personal homepages are Small World Graphs Our hypothesis (confirmed by Stanford and MIT personal homepages networks): networks of personal homepages are Small World Graphs
8
8 Stanford Graph
9
9 MIT Graph
10
10 Small World Graph Properties Watts & Strogatz (Collective Dynamics of small-world networks, 1999): Clustering Coefficient C is much larger than that of a Random Graph with same n° of vertices and avg n° of edges per vertex Clustering Coefficient C is much larger than that of a Random Graph with same n° of vertices and avg n° of edges per vertex Characteristic Path Length L is almost as small as L for the corresponding Random Graph Characteristic Path Length L is almost as small as L for the corresponding Random Graph
11
11 Clustering Coefficient (Watts & Strogatz, 1999) If a vertex v has k v neighbors then at most k v *(k v -1) directed edges can exist between them If a vertex v has k v neighbors then at most k v *(k v -1) directed edges can exist between them If C v denotes the fraction of these allowable edges that actually exists then C is the avg over all v If C v denotes the fraction of these allowable edges that actually exists then C is the avg over all v
12
12 Clustering Coefficient in Friendship Graphs C v : reflects the extent to which friends of v are also friends of each other C v : reflects the extent to which friends of v are also friends of each other C: measures the cliquishness of a typical friendship circle C: measures the cliquishness of a typical friendship circle
13
13 Predicting Friendship To predict if one person is a friend of another: we rank all users by their similarity to that person To predict if one person is a friend of another: we rank all users by their similarity to that person Hypothesis: friends are more similar to each other than others Hypothesis: friends are more similar to each other than others
14
14 Similarity Measurement Similarity measured analyzing text, links and mailing-lists Similarity measured analyzing text, links and mailing-lists To evaluate the likelihood that A is linked to B: we sum the n° of items the 2 users have in common To evaluate the likelihood that A is linked to B: we sum the n° of items the 2 users have in common Weighting Scheme: items unique to a few users are weighted more than common items Weighting Scheme: items unique to a few users are weighted more than common items
15
15 Friendship Prediction Algorithm’s Evaluation To evaluate the algorithm’s performance: To evaluate the algorithm’s performance: – we compute how many friends have a non-zero similarity score non-zero similarity score – we see what similarity rank the friends were assigned to were assigned to Problem: friends can appear have no items in common (little information about one of 2 users, users’ homepages used to express different interests) Problem: friends can appear have no items in common (little information about one of 2 users, users’ homepages used to express different interests)
16
16 Coverage and Predictive Ability of Data Sources Avg rank was computed for matches above a threshold such that all 4 data sources ranked an equal n° of users Avg rank was computed for matches above a threshold such that all 4 data sources ranked an equal n° of users
17
17 Have friends most in common than friends of friends?
18
18 Power-Law Few pages contain million of links but many pages have one or two Few pages contain million of links but many pages have one or two This diversity can express in a mathematical fashion This diversity can express in a mathematical fashion P(k)=Ck -t P(k)=Ck -t So probability of attaining a certain size k is proportional to 1/k to a power t (t greater than or equal to 1 and C=numerical constant) So probability of attaining a certain size k is proportional to 1/k to a power t (t greater than or equal to 1 and C=numerical constant)
19
19 Individual Item’s Predictive Ability Metric Used: Metric Used: ratio of the n° of linked users pairs associated with item divided by total n° of possible pairs Some Interesting Findings: Some Interesting Findings: –Shared items unique to a community are at the top, popular terms are at the bottom of MIT and Stanford lists –Different shared items at the top of Stanford and MIT lists (in MIT list, 5 of the top 10 terms are fraternities’ names) –In-link Stanford and MIT lists dominated by individual homepages –Bad predictive MIT and Stanford mailing lists are very general discussion lists, announcement lists and social activities lists
20
20 Individual Item’s Predictive Ability (cont’d)
21
21 Future Works New data sources: demographic information as address, year in school, major, … New data sources: demographic information as address, year in school, major, … To solve the problem that individuals interact with many people regularly, but do not link to all of them through web pages (possible solution: obtain social links directly from users) To solve the problem that individuals interact with many people regularly, but do not link to all of them through web pages (possible solution: obtain social links directly from users)
22
22 Applications To mine the correlations between groups of people (see: Pentland and Eagle works) To mine the correlations between groups of people (see: Pentland and Eagle works) To facilitate networking inside a community (see: LinkedIn) To facilitate networking inside a community (see: LinkedIn) Marketing research: to identify groups interested in a product, to rely on the Social Network to propagate information about some products Marketing research: to identify groups interested in a product, to rely on the Social Network to propagate information about some products
23
23 Conclusions Personal homepages provide a glimpse into the social structure of university communities Personal homepages provide a glimpse into the social structure of university communities Important: personal homepages reveal not only who knows to whom, but they give a context (e. g. shared hobbies, shared dorm) Important: personal homepages reveal not only who knows to whom, but they give a context (e. g. shared hobbies, shared dorm)
24
24 Thank You For Your Attention! Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.