Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S. Koutsourelakis School of Civil and Environmental Engineering.

Similar presentations


Presentation on theme: "Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S. Koutsourelakis School of Civil and Environmental Engineering."— Presentation transcript:

1 Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S. Koutsourelakis School of Civil and Environmental Engineering Cornell University Artificial Intelligence Seminar, 10/12/07 Joint work with T. Eliassi-Rad, LLNL

2 P.S. Koutsourelakis, pk285@cornell.edu Problem Setting A B D C age income location … age income location … age income location … age income location … friend co-worker phone call Traditional Clustering  Can we improve clustering by using relational data ?  What if only relational data was available ?  Can we make predictions about missing links or attributes?

3 P.S. Koutsourelakis, pk285@cornell.edu Problem Setting A collection of objects belonging to various types/domains (i.e. people, papers, locations, devices, movies, etc) Each object might have (observable) attributes Links/relations between: – Two or more objects – Objects can be of the same or different types – Binary (absence/presence), integer or real-valued Each link might have (observable) attributes Find groups of objects of each type, or Find common identities between objects of each type, or Organize objects into clusters that relate to each other in predictable ways Goal:

4 P.S. Koutsourelakis, pk285@cornell.edu Problem Setting A B D C Given an adjacency matrix where R i,j = 0 or 1 (observables), find cluster assignment I i (hidden/latent). ABCD A 000 B 0 00 C 10 0 D 011 Probabilistic - Bayesian Formulation posterior likelihood prior

5 P.S. Koutsourelakis, pk285@cornell.edu Problem Setting Likelihood: The relational behavior of the objects is completely determined by their cluster assignments I i For example: matrix specifying link probability between any two groups Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. Learning systems of concepts with an infinite relational model. AAAI 2006.

6 P.S. Koutsourelakis, pk285@cornell.edu Augmented Problem Setting If objects have attributes (i.e., x i which are also observed), then we can augment likelihood : If links R i,j are real-valued (i.e. duration of phone call, number of bytes etc): Functions of group assignments

7 P.S. Koutsourelakis, pk285@cornell.edu Problem Setting We need a prior on group assignments p(I). What is an appropriate prior p(K) on the number of clusters K? Groups are unlikely to be related as above. The distribution on I i should be exchangeable. That is, the order in which nodes are assigned can be permuted without changing the probability of resulting partition. Likelihood Function

8 P.S. Koutsourelakis, pk285@cornell.edu Nonparametric Bayesian Methods* Bayesian methods are most powerful when your prior adequately captures your beliefs. Inflexible models (e.g. with a fixed number of groups) might yield unreasonable inferences. Non-parametrics provide a way of getting very flexible models. Non-parametric models can automatically infer an adequate model size/complexity from the data, without needing to explicitly do Bayesian model comparison Many can be derived by starting with a finite parametric model and taking the limit as number of parameters * Nonparametric doesn’t mean there are no parameters, but that “the number of parameters grows with the data” (e.g. as in Parzen window density estimation)

9 P.S. Koutsourelakis, pk285@cornell.edu Chinese Restaurant Process (CRP) (potentially infinite dishes) MENU.

10 P.S. Koutsourelakis, pk285@cornell.edu Chinese Restaurant Process (CRP) Properties: CRP is exchangeable (i.e. order in which customers entered doesn’t matter) The number of groups grows as O(log n) where n is the number of nodes Inference with Gibbs sampling can be based on the conditionals above Larger γ favors more clusters number of people already eating dish j

11 P.S. Koutsourelakis, pk285@cornell.edu Infinite Relational Model (IRM) “Forward” Interpretation (single domain) 1) Sample group assignments I i from CRP(γ) resulting in K clusters 2) Sample iid η(a,b) for all a,b=1,2,..,K from Beta(β 1,β 2 ) 3) Sample iid each R i,j from Bernoulli(η(I i, I j )) From Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. Learning systems of concepts with an infinite relational model. AAAI 2006.

12 P.S. Koutsourelakis, pk285@cornell.edu 2 domains (animals + features) Animals form two groups: birds + 4-legged mammals Application: Object-Feature Dataset

13 P.S. Koutsourelakis, pk285@cornell.edu Application: Object-Feature Dataset Maximum-Likelihood Configuration Animal Domain Group 1: dove, hen, owl, falcon, eagle Group 2: duck, goose Group 3: fox, cat Group 4: horse, zebra Group 5: dog, wolf, tiger, lion, cow Feature Domain Group 1: small, 2-legs, feathers, fly Group 2: medium, hunt Group 3: big, hooves, mane, run Group 4: 4-legs, hair Group 5: swim

14 P.S. Koutsourelakis, pk285@cornell.edu Application: Object-Feature Dataset

15 P.S. Koutsourelakis, pk285@cornell.edu Predicting Missing Links % of Missing Links AUCAccuracy 10% 0.960.95 25% 0.960.91 50% 0.910.87 65% 0.820.80 Can we make predictions about missing links?

16 P.S. Koutsourelakis, pk285@cornell.edu Infinite Relational Model (IRM) Advantages: It is an unsupervised learner with only two tunable parameters β and γ. It can be applied to multiple node types and relations. It has all the advantages of a Bayesian formulations (missing data, confidence intervals) and nonparametric methods (adaptation to data, outlier accommodation). It has been successfully used for co-clustering object features, learning ontologies and social networks. Disadvantages: Significant computational effort It does not capture “multiple personalities.”

17 P.S. Koutsourelakis, pk285@cornell.edu “Multiple Personalities” In real data, objects (e.g. people) do not belong exclusively to one group, i.e. their identity is a mixture of basic components. These components can be the same for each object type but the mixing proportions might vary from one object to another.. IRM assumes that each object participates in all the relations it is involved with a single identity. A proper model should account for a different mixture for each object over all the possible identity components (which are common for the whole domain). This way we learn not only all the groups of the population but also all the existing mixtures of them. This can be achieved by introducing a Bayesian hierarchy groups ≡ identities

18 P.S. Koutsourelakis, pk285@cornell.edu Mixed-Membership Model (MMM) A: No, because the groups for each CRP will not be shared across objects Q: Can we use an independent CRP for each object

19 P.S. Koutsourelakis, pk285@cornell.edu Chinese Restaurant Franchise N restaurants with a common menu Object 1 = restaurant 1 Object 2 = restaurant 2 Object N = restaurant N ……………… Phase 1: Table Assignment Phase 2: Dish Assignment Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei. Hierarchical Dirichlet Processes. JASA, 2006.

20 P.S. Koutsourelakis, pk285@cornell.edu Chinese Restaurant Franchise number of customers already sitting at table t table assignment for customer m at restaurant i number of tables number of tables already eating dish k dish assignment for table t in restaurant i

21 P.S. Koutsourelakis, pk285@cornell.edu Mixed-Membership Model dish assignment of node i Properties: - Has a few more parameters, γ i, but also has higher expressivity - Inference with Gibbs sampling can be based on the conditionals above

22 P.S. Koutsourelakis, pk285@cornell.edu Non-Identifiability A B two objects: A, B two groups: 1, 2 100% group 1 50% group 1 50% group 2 Probability of a 1 link between any pair of groups

23 P.S. Koutsourelakis, pk285@cornell.edu Non-Identifiability A B 100% group 1 50% group 1 50% group 2  Different configurations (with 2, 3 or 4 groups) have the same likelihood  Prior determines inference results

24 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership 1 domains – 16 objects 4 distinct identities fully observed adjacency matrix

25 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership Model

26 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership Model

27 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership Model

28 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership IRM MMM Error w.r.t. actual probability that any pair of objects belong to the same group

29 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership IRM MMM

30 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership Model 2 domains (animals + features) Animals form two groups: birds + 4-legged mammals

31 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership Model

32 P.S. Koutsourelakis, pk285@cornell.edu Application: Mixed-Membership Model COW: Average posterior pairwise probabilities of belonging to the same group

33 P.S. Koutsourelakis, pk285@cornell.edu 34 people A disagreement between administrator (34) and instructor (1) led to the split of the club in two (circles and squares) Used binary matrix that records “like” relation Zachary’s Karate Club from M Girvan and MEJ Newman, Proc. Natl. Acad. Sci. USA, 2002

34 P.S. Koutsourelakis, pk285@cornell.edu Zachary’s Karate Club

35 P.S. Koutsourelakis, pk285@cornell.edu Learning Hierarchies Can we meaningfully infer a hierarchy of groups/identities? Identity 1 Identity 2 Identity 3 Identity 4 most general most specific

36 P.S. Koutsourelakis, pk285@cornell.edu Learning Hierarchies Nonparametric prior on trees Level 0 Level L Level L-1 each box is a different group/identity CRP L (a L ) CRP L-1 (a L-1 )

37 P.S. Koutsourelakis, pk285@cornell.edu Learning Hierarchies “Forward” interpretation (for a single domain) Hierarchical Mixed Membership Model (HMMM)

38 P.S. Koutsourelakis, pk285@cornell.edu Application: Artificial Dataset 1 domain – 40 objects 4 distinct identities fully observed adjacency matrix

39 P.S. Koutsourelakis, pk285@cornell.edu Application: Artificial Dataset

40 P.S. Koutsourelakis, pk285@cornell.edu Application: Political Books  43 liberal, 49 conservative, 13 neutral  Links imply frequent co-purchasing by the same buyers (Amazon.com)

41 P.S. Koutsourelakis, pk285@cornell.edu Application: Political Books 27% 50% 23% 19% 46% 35% 0% 100% 8% 84% 6% 94% 0% 100% 0% 100% 0% 100% 0% 100%22 26 9 6 18 71 6 3

42 P.S. Koutsourelakis, pk285@cornell.edu Reality Mining MIT Data  1 node type (people)  97 people + all outsiders in one node  22 different positions (professor,staff,1styeargrad,….) sloan 29% faculty& staff 5% students 52% other 14%

43 P.S. Koutsourelakis, pk285@cornell.edu Reality Mining MIT Data 12% 64% 12% 50% 33% 17 % 0%0% 7% 86% 7% 0% 4% 0% 96% 100% 0% 100% 0% 33% 0% 50 % 17 % 15% 83% 4%4% 8%8% 25% 50% 0% 25%26 17 6 23 4 1 14 1 6

44 P.S. Koutsourelakis, pk285@cornell.edu Conclusions and Outlook Relational data contain significant information about group structure Bayesian models allow the analyst to make inferences about communities of interest while quantifying the level of confidence, even when a significant proportion of the data is missing Nonparametric models provide a way of getting very flexible priors that allow the model to adapt to the data. IRM is a very lightweight framework with a very wide range of applicability, but cannot capture multiple identities. MMM and HMMM allows for increased flexibility and provides additional information about objects that simultaneously belong to several groups. Challenges:  Accelerated inference especially when dealing with large datasets: - Variational methods - Sequential Monte Carlo  Appropriate priors for time dependent datasets are needed

45 P.S. Koutsourelakis, pk285@cornell.edu Application: Senate Vote 2002  50 Democrats, 49 Republicans, 1 Independent  Link R i,j =1 if: - voted the same - have both taken more or less than the average contribution average: $13,800

46 P.S. Koutsourelakis, pk285@cornell.edu Application: Senate Vote 2002 0% 100% 0% 67% 33% 0% 25% 75% 3% 12% 85% 0% 70% 30%


Download ppt "Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S. Koutsourelakis School of Civil and Environmental Engineering."

Similar presentations


Ads by Google