Download presentation
Presentation is loading. Please wait.
1
Mining social networks for knowledge management Prabhakar Raghavan
2
Overview Sampling of social network studies The knowledge management challenge › Enterprise complications Extended social networks › Network and tensor-based models Power law behaviors › Why, and what they mean for mining A research agenda
3
Milgram’s experiments Began with volunteers from Omaha, NE. Asked to get a letter to a physician near Boston. Could only send to first-name acquaintance, to be forwarded etc. Median path length of successful deliveries was 6. Led to famous “6 degrees of separation” folklore.
4
eMail cliques: Schwartz/Wood Studied eMail (sub)graph. Proposed metrics for groups of people to share interests; cluster analysis. Qualitatively “good” results. Raised issues of ethical use of data and privacy.
5
Various other projects PHOAKS › Extracting heavily cited resources in newsgroups, etc. Call graphs › Discerning home, business and fax lines › Calling circles. Recommendation systems › Input: users’ product endorsements. › Output: product recommendations to each user.
6
Trawling bipartite cliques Take the (directed) Web link graph. Enumerate all (small) bipartite cliques. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins (1998).
7
Insights from hubs Link-based hypothesis: Dense bipartite subgraph Web community. Hub Authority FansCenters
8
Communities from cores What is a “dense bipartite subgraph”? Define (i,j)-core: complete bipartite subgraph with i nodes all of which point to each of j others. Enumerate (i,j)-cores for various small i,j. (2,3) core
9
Results for cores
10
Japanese Elementary Schools The American School in Japan The Link Page ‰ªès—§ˆä“c¬ŠwZƒz[ƒƒy[ƒW Kids' Space ˆÀés—§ˆÀé¼”¬ŠwZ ‹{鋳ˆç‘åŠw‘®¬ŠwZ KEIMEI GAKUEN Home Page ( Japanese ) Shiranuma Home Page fuzoku-es.fukui-u.ac.jp welcome to Miasa E&J school _“Þ쌧E‰¡ls— §’†ì¼¬ŠwZ‚̃y http://www...p/~m_maru/index.html fukui haruyama-es HomePage Torisu primary school goo Yakumo Elementary,Hokkaido,Japan FUZOKU Home Page Kamishibun Elementary School... schools LINK Page-13 “ú–{‚ÌŠwZ a‰„¬ŠwZƒz[ƒƒy[ƒW 100 Schools Home Pages (English) K-12 from Japan 10/...rnet and Education ) http://www...iglobe.ne.jp/~IKESAN ‚l‚f‚j¬ŠwZ‚U”N‚P‘g¨Œê ÒŠ—’¬—§ÒŠ—“Œ¬ŠwZ Koulutus ja oppilaitokset TOYODA HOMEPAGE Education Cay's Homepage(Japanese) –y“쬊wZ‚̃z[ƒƒy[ƒW UNIVERSITY ‰J—³¬ŠwZ DRAGON97-TOP ‰ª¬ŠwZ‚T”N‚P‘gƒz[ƒƒy[ƒW ¶µ°é¼ÂÁ© ¥á¥Ë¥å¡¼ ¥á¥Ë¥å¡¼ CentersFans
11
Yenta: Forman Analyzes documents “associated” with each user. Distils significant “interests” for each. Matches/clusters groups of users with overlapping interests. Decentralized; aims for privacy protection. Elements of peer-to-peer operation.
12
ReferralWeb: Kautz/Selman Establishes links between people, e.g., › co-authorship › colleagues in an organization Allows search through this social network, e.g., › find me someone within distance 2 who will referee a paper on xyz...
13
Who can I ask to review a paper on “expander graphs”? Source: H. Kautz & B. Selman
14
Paths to Experts Source: H. Kautz & B. Selman
15
Observations Source: H. Kautz & B. Selman Official company hierarchy only a sparse subset of the corporate social network Shortest (and often best) paths involve a combination of official and unofficial links › Conditions for trust and evaluation may greatly differ › Global social network is the union of many different kinds of sub-networks Search greatly aided when user can choose different views of the network types of edge strength of edge
16
Knowledge management The big challenge: Increase productivity in knowledge workers by getting them the expertise they need at all times > the right information (documents?) > the right experts. Enterprise: group of people engaged in a collective endeavour, typically with proprietary content.
17
Enterprise knowledge mgmt. Examples - Schwartz/Wood, ReferralWeb, Yenta, PHOAKS all have some applicability. › ReferralWeb was originally devised and deployed at AT&T Labs. Enterprise knowledge management introduces some novel challenges.
18
Challenges in the enterprise Information resides in heterogeneous › formats (email, pdf, word, …) › repositories (Lotus, Exchange, Documentum, databases …) › applications (HR, ERP, Siebel, …) Need to combine structured relations (from applications) with learning.
19
Challenges in the enterprise Data security: information units have many different access classes. › e.g., compound documents have pieces, each with its own access lists. › My search should hit the doc only if it hits the pieces I can see. Knowledge security is the deeper issue. › Learning: consider class models learned from security-limited information. › Inferences (what does a recommendation tell me about confidential data?)
20
General formulation How do we combine different sources of content and context? › terms in docs › links between docs › users’ access patterns › users’ profile information.
21
General formulation Every item of interest - each term, query, doc, person, treated as a node. Impose similarity metric between pairs of nodes. Need to be able to measure proximity from sets of nodes (a person+a doc they’re viewing+a query they’ve issued) to nodes of a target type (a person).
22
Issues in formulation If a user is close to two docs d1 and d2, are the docs d1 and d2 close to each other? How do you measure proximity from a set of nodes? How do you capture collaborative (as opposed to content and context-based filtering). How do you succinctly represent and manipulate similarities?
23
Graph-based models Each node an entity, associated with a set of features. Pairwise similarities based on feature matches. Issues: › Not easy to do proximity from sets of nodes. › Have to maintain (quadratic) pairwise information. › Consistency.
24
Tensor-based models Turn every entity into a vector. Axes are terms, profile features, … Combination of user, context++ becomes a tensor. Measure proximity to tensors of a certain type (e.g., user, doc recommendation).
25
Context with content Docs’ content captured in term axes. Other attributes (user profile, etc.) captured in other axes. A probe consists of 1 : a tensor t (say, a user vector plus a query) 2 : a type of vector to be retrieved (say, a user). Result = vectors of chosen type closest to t.
26
“Standard” mining tricks Dimensionality reduction - for collaborative filtering. Hierarchical clustering - for fast near- neighbor search. Incremental indexes - real-time updates.
27
Upshot Verity social networks project Screenshot. Screenshot Security issues remain thorny. What aspects of social behavior can we exploit in the algorithms? › Power laws
28
Power laws in mining
29
Recurring phenomena Many interesting distributions › term frequencies in a corpus › citations › in-links to web pages › document access frequencies … follow an inverse polynomial function.
30
Zipf versus power laws We call a distribution on the positive integers › a power law if it’s of the form p(i) ~ 1/i . › a Zipf law if p(i)~1/j where j is the rank of i. Typically >1.
31
Other Zipf/power laws Populations of US cities Degrees of internet nodes See http://www.cs.berkeley.edu/~christos/games/powerlaw.ps
32
What leads to power laws? “Scale free” growth. “Highly optimized tolerance”. Behavioral models. › Model behavior of individuals in social network.
33
In-degrees on the Web graph Web in-degrees are distributed as p(i) ~ 1/i 2.1. › Consistently across many independent studies. Erdos-Renyi random graphs would not lead to such a power law. Need a new stochastic model for such graphs.
34
Random replication graphs Central thesis - random replication in an evolving graph. › Some page creators create content without regard to what exists on the web. › Many are inspired by pre-existing content. › i.e., some links are random, others are copied from pre-existing pages.
35
Model details Evolution: Nodes are created in a sequence of discrete time steps › e.g. at each time step, a new node is created with d=O(1) out-links Probabilistic copying › links go to random nodes with probability › copy d links from a random “existing” node with probability
36
Theoretical Results New model yields › convergence to power-law in-degrees; › › number of bipartite cliques that grows with time; › evolution without copying would not yield these phenomena. R. Kumar, P. Raghavan, S. Rajagopalan, R. Sivakumar, A. Tomkins, E. Upfal (2000).
37
Compound structures “First order” structures (terms frequencies, in-degrees, citations) exhibit power laws. What about “higher order” structures (pairs of terms, bipartite cliques, etc.)? Motivations: › Criteria for mining interesting higher order structures. › Turning algorithms for higher order mining.
38
Pair frequencies for terms Analyzed several corpora of news items. Studied frequencies of k-tuples of terms (k=1, 2, …) in › corpus › documents › sentences › windows of width w. Ongoing work with P. Tsaparas.
39
Sentence log-rank vs. log- frequency
40
Pair distributions Based on term frequencies, compute pair frequencies under independence assumption. Measure actual pair frequencies. › Outliers under mutual information measure. Higher order outliers: useful for building clusters/concept maps in corpora.
41
Pairs: independence vs. actual
42
Computational speedup Inspired by pruning algorithms in trawling. As higher order associations are built up, keep discarding obviated terms. › Docs keep getting shorter. › Fit in memory quickly. › Not an issue with relational tables?
43
A research agenda New ways of combining content, context and collaboration in the social network. Analyze and model structures in social networks. › Tune algorithms on models. › Build on “standard” mining paradigms: associations, clustering... Incorporate enterprise constraints: › Roles and profile information from apps › Security and Privacy!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.