Identity and Search in Social Networks Alireza Abbasi abbasi@snu.ac.kr 406.534 Industrial Information Technology Technical Paper Presentation Information Technology Policy Program Seoul National University
Bibliographic Details Title: Identity and search in social networks Authors: Duncan J. Watts (Department of Sociology, Columbia University), Peter Sheridan Dodds (Columbia Earth Institute, Columbia University), M. E. J. Newman (Santa Fe Institute) Journal: Science Vol.: 262 No.: 5571 Pp.: 1302-1305 Month: April Year: 2002
Context Small-World Experiment Searchability of Networks Proposed Model Six Contentions about Social Networks Research Objective Results Conclusion
Small-World Experiment Travers & Milgram’s famous Small-World experiment (1960s): Choose a random person in Nebraska, Bob Ask Bob to deliver a letter to a random person in Massachusetts, Lashawn (Target Person) Tell Bob target’s name, address, and occupation Instruct Bob to only send letter to people he knows on a first-name basis
Small-World Experiment Bernard, David’s cousin who went to college with David, mayor of Bob’s town Bob, a farmer in Nebraska Maya, who grew up in Boston Six Degrees of Separation With Lashawn
Result of Small-World Experiment Short paths exist between individuals in a large social network Ordinary people can find these short paths
Searchability The property of being able to find a target quickly Has been shown to exist in certain specific classes of networks that either possess a certain fraction of hubs or are built upon an underlying geometric lattice which acts as a proxy for “social space"
Proposed Model A model for a social network that is based upon plausible social structures and offers an explanation for the phenomenon of searchability Follows from 6 contentions about Social Networks
1. Identity of Individuals Sets of characteristics which Individuals attribute to themselves and others (social groups)
2. Hierarchically Network Individuals cluster, the world hierarchically into a series of layers, top layer accounts for the entire world and deeper layer represents a cognitive division into a greater number of increasingly specific groups
Hierarchy: Measuring Social Distance Similarity xij between individuals i and j as the height of their lowest common ancestor level xij = 1, if i and j belong to the same group. g: group size, l: depth, b: branching ratio.
3. Group Membership α measure of homophily Basis for social interaction, and acquaintanceship a link distance x with probability of acquaintance between nodes i and j α is a tunable parameter, and c is a normalizing constant α measure of homophily If : all links will be as short as possible If : individual is equally likely to interact with any other (yielding a uniform random graph)
4. Multi-dimensional Clustering Individuals hierarchically cluster the social world in more than one way (dimensional) Example: by geography and by occupation A node's identity is then as an H-dimensional coordinate vector vi , where vhi is the position of node i in the h th dimension.
5. Social Distance individuals construct a measure of “social distance" yij, the minimum ultrametric distance over all dimensions between two nodes i and j.
6. Individual local Information Individuals have two kinds of partial information Social distance measured globally but which is not a true distance Network paths generate true distances but which are known only locally
Research Objective Determine the conditions under which the average length <L> of a message chain connecting a randomly selected sender s to a random target t is small. p: message failure probability q: probability of message reaching its target r: fixed number (threshold)
Result Searchable networks occupy a broad region of parameter space (α,H) corresponds to choices of the model parameters Model suggests that searchability is a generic property of real-world social networks.
Result Example (H-α) The region of searchable networks shrinks with N r = 0.05, p = 0.25, b = 2, g = 100, z = g -1 = 99 N = 102400 (solid), N = 204800 (dot-dash), and N = 409600 (dash) The region of searchable networks shrinks with N
Result Example q(H) q: probability of message reaching its target, H: # of dimensions α = 0 (squares) and α = 2 (circles) for the N = 102400, threshold r = 0.05. Empty symbols indicate the network is searchable (q ≥ r) The best performance over the largest interval of α is achieved for H = 2 or 3
Compare the distribution of chain lengths n(L): the number of completed chains of length L Bar graph: original small-world experiment P = 0.25, H = 2, z = 300, g = 100, For α = 1 and b = 10 Average chain length <L> ~ 6.7 near to original Small-World Experiment <L> ~ 6.5
Conclusion Model can apply to any data structure in which data elements exhibit quantifiable characteristics analogous to our notion of identity, and similarity between two elements can be judged along more than one dimension. Efficient decentralized searches can be conducted utilizing simple, greedy algorithms providing only that the characteristics of the target element and the current element's immediate neighbors are known. location of data files in peer-to-peer networks, pages on the World Wide Web, information in distributed databases
Thank you & Best wishes