Presentation is loading. Please wait.

Presentation is loading. Please wait.

Augmenting Probabilistic Graphical Models with Ontology Information: Object Classification R. Möller Institute of Information Systems University of Luebeck.

Similar presentations


Presentation on theme: "Augmenting Probabilistic Graphical Models with Ontology Information: Object Classification R. Möller Institute of Information Systems University of Luebeck."— Presentation transcript:

1 Augmenting Probabilistic Graphical Models with Ontology Information: Object Classification R. Möller Institute of Information Systems University of Luebeck

2 Large-Scale Object Recognition using Label Relation Graphs Jia Deng 1,2, Nan Ding 2, Yangqing Jia 2, Andrea Frome 2, Kevin Murphy 2, Samy Bengio 2, Yuan Li 2, Hartmut Neven 2, Hartwig Adam 2 University of Michigan 1, Google 2 Based on:

3 Object Classification Assign semantic labels to objects Corgi Puppy Dog Cat ✔ ✔ ✖ ✔

4 Object Classification Assign semantic labels to objects Probabilities 0.9 0.8 0.9 0.1 Corgi Puppy Dog Cat

5 Object Classification Assign semantic labels to objects Feature Extractor Features Classifier Probabilities 0.9 0.8 0.9 0.1 Corgi Puppy Dog Cat

6 Object Classification Multiclass classifier: Softmax Corgi Puppy Dog Cat / / / / + Assumes mutual exclusive labels. 0.2 0.4 0.3 0.1 Independent binary classifiers: Logistic Regression Corgi Puppy Dog Cat 0.4 0.8 0.6 0.2 No assumptions about relations.

7 Object labels have rich relations Corgi Puppy Dog Cat Exclusion Hierarchical Dog Cat Corg i Puppy Overlap Softmax: all labels are mutually exclusive  Logistic Regression: all labels overlap 

8 Goal: A new classification model Respects real world label relations Corgi Puppy Dog Cat 0.9 0.8 0.9 0.1 Corgi Puppy Dog Cat

9 Visual Model + Knowledge Graph Corgi Puppy Dog Cat Visual Model 0.9 0.8 0.9 0.1 Knowledge Graph Joint Inference Assumption in this work: Knowledge graph is given and fixed.

10 Agenda Encoding prior knowledge (HEX graph) Classification model Efficient Exact Inference

11 Agenda Encoding prior knowledge (HEX graph) Classification model Efficient Exact Inference

12 Hierarchy and Exclusion (HEX) Graph Corgi Puppy Dog Cat Exclusion Hierarchical Hierarchical edges (directed) Exclusion edges (undirected)

13 Examples of HEX graphs Car Bird Dog Cat MaleFemale Person Child Boy Round Red Shiny Thick Mutually exclusiveAll overlapping Combination Girl

14 State Space: Legal label configurations DogCatCorgiPuppy 0000 0001 0010 0011 1000 … 1100 1101 … Corgi Puppy Dog Cat Each edge defines a constraint.

15 State Space: Legal label configurations Corgi Puppy Dog Cat DogCatCorgiPuppy 0000 0001 0010 0011 1000 … 1100 1101 … Hierarchy: (dog, corgi) can’t be (0,1) Each edge defines a constraint.

16 State Space: Legal label configurations Corgi Puppy Dog Cat DogCatCorgiPuppy 0000 0001 0010 0011 1000 … 1100 1101 … Exclusion: (dog, cat) can’t be (1,1) Hierarchy: (dog, corgi) can’t be (0,1) Each edge defines a constraint.

17 Agenda Encoding prior knowledge (HEX graph) Classification model Efficient Exact Inference

18 HEX Classification Model Pairwise Conditional Random Field (CRF) Input scores Binary Label vector

19 HEX Classification Model Pairwise Conditional Random Field (CRF) Binary Label vector Unary: same as logistic regression Input scores

20 HEX Classification Model Pairwise Conditional Random Field (CRF) Binary Label vector  All illegal configurations have probability zero. Unary: same as logistic regression If violates constraints Otherwise 0 Pairwise: set illegal configuration to zero Input scores

21 HEX Classification Model Pairwise Conditional Random Field (CRF) Binary Label vector Partition function: Sum over all (legal) configurations Input scores

22 HEX Classification Model Pairwise Conditional Random Field (CRF) Binary Label vector Probability of a single label: marginalize all other labels. Input scores

23 Special Case of HEX Model Softmax Car Bird Dog Cat Round Red Shiny Mutually exclusive All overlapping Thick Logistic Regressions

24 Learning Corgi Puppy Dog Cat DNN Label: Dog Maximize marginal probability of observed labels Back Propagation Dog Corgi Puppy Cat 1 ? ? ? DNN = Deep Neural Network

25 Agenda Encoding prior knowledge (HEX graph) Classification model Efficient Exact Inference

26 Naïve Exact Inference is Intractable Inference: – Computing partition function – Perform marginalization HEX-CRF can be densely connected (large treewidth)

27 Observation 1: Exclusions are good Car Bird Dog Cat Lots of exclusions  Small state space  Efficient inference Realistic graphs have lots of exclusions. Rigorous analysis in paper. Number of legal states is O(n), not O(2 n ).

28 Observation 2: Equivalent graphs Dog Cat Corgi Puppy Pembroke Welsh Corgi Cardigan Welsh Corgi Dog Cat Corgi Puppy Pembroke Welsh Corgi Cardigan Welsh Corgi

29 Observation 2: Equivalent graphs Sparse equivalent Small Treewidth Dynamic programming Dog Cat Corgi Puppy Pembroke Welsh Corgi Cardigan Welsh Corgi Dog Cat Corgi Puppy Pembroke Welsh Corgi Cardigan Welsh Corgi Dog Cat Corgi Puppy Pembroke Welsh Corgi Cardigan Welsh Corgi Dense equivalent Prune states Can brute force

30 A B F B E D C G F B C F HEX Graph Inference A B E D C G F A B E D C G F A B E D C G F A B F B E D C G F B C F 1. Sparsify (offline) 3.Densify (offline) 2.Build Junction Tree (offline) 4.Prune Clique States (offline) 5. Message Passing on legal states (online)

31 31 Digression: Polytrees A network is singly connected (a polytree) if it contains no undirected loops. Theorem: Inference in a singly connected network can be done in linear time*. Main idea: in variable elimination, need only maintain distributions over single nodes. * in network size including table sizes.   © Jack Breese (Microsoft) & Daphne Koller (Stanford)

32 32 The problem with loops Rain Cloudy Grass-wet Sprinkler P(c) 0.5 P(r) cc 0.99 0.01 P(s) cc 0.010.99 deterministic or The grass is dry only if no rain and no sprinklers. P(g) = P(r, s) ~ 0 © Jack Breese (Microsoft) & Daphne Koller (Stanford)

33 33 The problem with loops contd. = P(r, s) P(g | r, s) P(r, s) + P(g | r, s) P(r, s) + P(g | r, s) P(r, s) 0 10 0 = P(r) P(s) ~ 0.5 ·0.5 = 0.25 problem ~ 0 P(g) = © Jack Breese (Microsoft) & Daphne Koller (Stanford)

34 34 Variable elimination A B C P(c) =  P(c | b)  P(b | a) P(a) ba P(b) x P(A)P(B | A) P(B, A)  A P(B) x P(C | B) P(C, B)  B P(C) © Jack Breese (Microsoft) & Daphne Koller (Stanford)

35 35 Inference as variable elimination A factor over X is a function from val(X) to numbers in [0,1]: – A CPT is a factor – A joint distribution is also a factor BN inference: – factors are multiplied to give new ones – variables in factors summed out A variable can be summed out as soon as all factors mentioning it have been multiplied. © Jack Breese (Microsoft) & Daphne Koller (Stanford)

36 36 Variable Elimination with loops Smoking GenderAge Cancer Lung Tumor Serum Calcium Exposure to Toxics x P(A,G,S) P(A)P(S | A,G)P(G) P(A,S)  G  E,S P(C) P(L | C) x P(C,L)  C P(L) Complexity is exponential in the size of the factors P(E,S)  A P(A,E,S) P(E | A) x P(C | E,S) P(E,S,C) x © Jack Breese (Microsoft) & Daphne Koller (Stanford)

37 37 Join trees* P(A) P(S | A,G) P(G) P(A,S) x xx A, G, S E, S, C C, LC, S-C A join tree is a partially precompiled factorization Smoking GenderAge Cancer Lung Tumor Serum Calcium Exposure to Toxics * aka Junction Tree, Lauritzen-Spiegelhalter, or Hugin algorithm, … A, E, S © Jack Breese (Microsoft) & Daphne Koller (Stanford)

38 Junction Tree Algorithm Converts Bayes Net into an undirected tree – Joint probability remains unchanged – Exact marginals can be computed Why ??? – Uniform treatment of Bayes Net and MRF – Efficient inference is possible for undirected trees


Download ppt "Augmenting Probabilistic Graphical Models with Ontology Information: Object Classification R. Möller Institute of Information Systems University of Luebeck."

Similar presentations


Ads by Google