Download presentation
Presentation is loading. Please wait.
Published byPierce Sutton Modified over 9 years ago
1
Today’s Topics 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 131 Read Chapter 21 (skip Section 21.5) of textbook Exam THURSDAY Dec 17, 5:30-7:30pm (here) Review of Fall 2014 Final Dec 15 TA Dmitry at Epic 5:30-7:30pm on Weds Dec 16? HW5 due Dec 8 (and no later than Dec 11) Probabilistic Logic Markov Logic Networks (MLNs) -a popular and successful probabilistic logic Collective Classification
2
Logic & Probability: Two Major Math Underpinnings of AI Logic Probabilities Add Probabilities Add Relations Statistical Relational Learning MLNs a popular approach Slide 212/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
3
Statistical Relational Learning (Intro to SRL, Getoor & Tasker (eds), MIT Press, 2007) Pure Logic Too ‘Fragile’ everything must be either true or false Pure Statistics Doesn’t Capture/Accept General Knowledge Well (tell it once rather than label N ex’s) x human(x) y motherOf(x, y) Many Approaches Created Over the Years, Especially Last Few including some at UWisc 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 133
4
Markov Logic Networks (Richards and Domingos, MLj, 2006) Use FOPC, but add weights to formulae (‘syntax’) wgt=10 x,y,z motherOf(x, z) fatherOf(y, z) married(x, y) - weights represent ‘penalty’ if a candidate world state violates the rule - for ‘pure’ logic, wgt = ∞ Formulae interpreted (‘semantics’) as compact way to specify a type of graphical model called a Markov Net –like a Bayes net, but undirected arcs –probabilities in Markov nets specified by clique potentials, but we won’t cover them in cs540 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 134 Pedro Domingos
5
Using an MLN (‘Inference’) Assume we have a large knowledge base of probabilistic logic rules Assume we are given the truth values of N predicates (the ‘evidence’) We may be asked to estimate the most probable joint setting for M ‘query’ predicates Brute-force solution –Consider 2 M possible ‘complete world states’ –Calculate truth value of all grounded formula in each state –Return one with smallest total penalty for violated MLN rules (or, equivalently, the one with largest sum of satisfied rules) Slide 512/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
6
Probability of Candidate World States Prob(specific world state) (1/Z) exp( weights of grounded formulae that are true in this world state) Z is a normalizing term; we need to sum over all possible world states (challenging to estimate) - A world state is a conjunction of predicates (eg, married(John, Sue), …, friends(Bill, Ann) ) - if we only want the most probable world state, we don’t need to compute Z If a world state violates a rule with infinite weight, probability of that world state is zero (why?) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 136
7
Grounding the MLN Formulae (replacing variables with constants) Assume we have this domain knowledge wgt = 2 x,y P(x,y) Q(x) wgt = 3 x P(x,1) (R(1) R(x)) And these constants: 1 and 2 So we have these ‘grounded’ rules (wgts not shown): P(1,1) Q(1) P(1,2) Q(1) P(2,1) Q(2) P(2,2) Q(2) P(1,1) R(1) P(2,1) (R(1) R(2)) Aside: Each grounded rule becomes a clique in Markov network (like a CPT in a Bayes Net) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 137
8
Simple MLN Example Have: wgt=2 P Q wgt=7 P Q Four possible world states The normalizing term: Z = e 2 + e 9 + e 7 + e 9 e 10 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 PQProbability (unnormalized) False (1/Z) e 2+0 False True (1/Z) e 2+7 True False (1/Z) e 0+7 True (1/Z) e 2+7 What is prob(P=true & Q=false) in ‘std’ logic? 8 1 / e 8 1 / e 1 / e 3 1 / e
9
Collective Classification Assume we need to predict the outputs for N examples Could knowing (probability) example i is true impact (probability) example j is true? Ie, relaxing the iid assumption about examples For instance “If Alice and Bob are friends, then if Alice likes a movie, Bob (probably) does as well.” 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 139
10
Collective Classification in MLNs Imagine we have a bunch of inference rules for predicting likes(Person, Food) We could add this to our MLN If livesIn(?Person1, ?City) livesIn(?Person2, ?City) isaFood(?Food) Then [ likes(?Person1, ?Food) likes(?Person2, ?Food) ] with wgt = 3 “People in the same city generally like the same sorts of food” 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 1310 So if we predicted likes of N people, the MLN would be encouraged to give consistent answers
11
A Famous MLN Example (first rule modified to use ↔) “Smoking frequently causes cancer.” wgt = 3 x smokes(x) ↔ cancer(x) // Assume it’s the ONLY cause “Friends of smokers are likely to smoke.” wgt = 2 x,y friends(x, y) ˄ smokes(x) smokes(y) Assume below are our facts and we want to know the probs of the four world states involving smoking or not of John and Mary A simple collective classification example (try yourself with THREE people!) friends(Mary, Mary), friends(Mary, John), ¬ cancer(Mary) friends(John, Mary), friends(John, John), cancer(John) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 11 Don’t buy these →
12
A Famous MLN Example (2) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 friends(M, M) ˄ smokes(M) smokes(M) wgt = 2 friends(M, J) ˄ smokes(M) smokes(J) wgt = 2 friends(J, M) ˄ smokes(J) smokes(M) wgt = 2 friends(J, J ) ˄ smokes(J) smokes(J) FACTS friends(M, M), friends(M, J), ¬ cancer(M) friends(J, M), friends(J, J), cancer(J) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 12
13
A Famous MLN Example (3) Possible Complete Word States (1) friends(M,M), friends(M, J), friends(J,M), friends(J, J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (2) friends(M,M), friends(M, J), friends(J,M), friends(J, J) ¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) (3) friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) (4) friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 13
14
A Famous MLN Example (4) Possible Complete Word States (1) friends(M,M), friends(M,J), friends(J,M), friends(J,J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 ¬ friends(M,M) ˅ ¬ smokes(M) ˅ smokes(M) wgt = 2 ¬ friends(M,J) ˅ ¬ smokes(M) ˅ smokes(J) wgt = 2 ¬ friends(J, M) ˅ ¬ smokes(J) ˅ smokes(M) wgt = 2 ¬ friends(J,J ) ˅ ¬ smokes(J) ˅ smokes(J) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 14 FTTTTTFTTTTT Sum of Wgts = 0 + 3 + 2 + 2 + 2 + 2 = 11
15
A Famous MLN Example (5) Possible Complete Word States (2) friends(M,M), friends(M,J), friends(J,M), friends(J,J) ¬ smokes(M), smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 ¬ friends(M,M) ˅ ¬ smokes(M) ˅ smokes(M) wgt = 2 ¬ friends(M,J) ˅ ¬ smokes(M) ˅ smokes(J) wgt = 2 ¬ friends(J, M) ˅ ¬ smokes(J) ˅ smokes(M) wgt = 2 ¬ friends(J,J ˅ ¬ smokes(J) ˅ smokes(J) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 15 TTTTFTTTTTFT Sum of Wgts = 3 + 3 + 2 + 2 + 0 + 2 = 12
16
A Famous MLN Example (6) Possible Complete Word States (3) friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 ¬ friends(M,M) ˅ ¬ smokes(M) ˅ smokes(M) wgt = 2 ¬ friends(M,J) ˅ ¬ smokes(M) ˅ smokes(J) wgt = 2 ¬ friends(J, M) ˅ ¬ smokes(J) ˅ smokes(M) wgt = 2 ¬ friends(J,J) ˅ ¬ smokes(J) ˅ smokes(J) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 16 F F T F T T Sum of Wgts = 0 + 0 + 2 + 0 + 2 + 2 = 6
17
A Famous MLN Example (7) Possible Complete Word States (4) friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), smokes(J), ¬ cancer(M), cancer(J) “Smoking frequently causes cancer.” GROUNDED wgt = 3 smokes(J) ↔ cancer(J) wgt = 3 smokes(M) ↔ cancer(M) “Friends of smokers are likely to smoke.” GROUNDED wgt = 2 ¬ friends(M,M) ˅ ¬ smokes(M) ˅ smokes(M) wgt = 2 ¬ friends(M,J) ˅ ¬ smokes(M) ˅ smokes(J) wgt = 2 ¬ friends(J, M) ˅ ¬ smokes(J) ˅ smokes(M) wgt = 2 ¬ friends(J,J) ˅ ¬ smokes(J) ˅ smokes(J) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 17 TFTTTTTFTTTT Sum of Wgts = 3 + 0 + 2 + 2 + 2 + 2 = 11
18
A Famous MLN Example (8) Possible Complete Word States (1) friends(M,M), friends(M,J), friends(J,M), friends(J,J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) 11 (2) friends(M,M), friends(M,J), friends(J,M), friends(J,J) ¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 12 (3) friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) 6 (4) friends(M,M), friends(M,J), friends(J,M), friends(J,J) smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 11 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 18 Sum of Wgts of Satisfied Rules
19
A Famous MLN Example (8) Possible Complete Word States (1) friends(M,M), friends(M, J), friends(J,M), friends(J,J) ¬ smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) 11 (2) friends(M,M), friends(M, J), friends(J,M), friends(J, J) ¬ smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 12 (3) friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), ¬ smokes(J), ¬ cancer(Mary), cancer(John) 6 (4) friends(M,M), friends(M, J), friends(J,M), friends(J, J) smokes(M), smokes(J), ¬ cancer(Mary), cancer(John) 11 Z = e 11 + e 12 + e 6 + e 11 = e 11 (1 + e + e -5 + 1) 4.7 e 11 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 19 Sum of Wgts of Satisfied Rules Prob = e 11 / Z 1 / 4.7 = 0.21 Prob = e 12 / Z e / 4.7 = 0.58 Prob = e 6 / Z (1 / 5) e -5 0.001 Prob = e 11 / Z 1 / 4.7 = 0.21
20
A Famous MLN Example (9) (1) ¬ smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) lost 3 pts because John had cancer yet wasn’t a smoker (2) ¬ smokes(M), smokes(J), ¬ cancer(M), cancer(J) lost 2 pts because Friends J and M had diff smoking habits (3) smokes(M), ¬ smokes(J), ¬ cancer(M), cancer(J) lost 8 pts because of three reasons (4) smokes(M), smokes(J), ¬ cancer(M), cancer(J) lost 3 pts because Mary smoked but didn’t have cancer If we had wgt=3 smoking cancer and wgt=2 cancer smoking Then (1) and (4) would have scored differently (but slides already too crowded!) If we had more people, we would have more clearly seen influence of collective classification - try it yourself! 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 20
21
Assume we know Prob( q(1,2) ) = 0.85 We can represent this as Prob( observedQ(1,2) ) = 1.0 // ie, absolute evidence wgt = 2 observedQ(1,2) q(1,2) Handling Probabilistic Evidence - What if the Given’s are Uncertain? 12/1/15 21CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
22
Grounded Networks can be Very LARGE! Given wgt=2 x,y,z Friends(x, y) Friends(y, z) Friends(x, z) and a world with 10 9 people How big is the grounded network? 10 18 nodes since we need all groundings of Friends(?X, ?Y) ( and the number of world states is 2 10 18 ) So SAMPLING methods needed (and have been published) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 1322
23
Knowledge-Base Population (http://www.nist.gov/tac/2015/KBP/)http://www.nist.gov/tac/2015/KBP/ Given: Text Corpus (ie, ordinary English) Do:Extract Facts about People Born(Person, Date) AttendedCollege(Person, College, DateRange) EmployedBy(Person, Company, DateRange) SpouseOf(PersonA, PersonB, DateRange) ParentOf(PersonA, PersonB, DateRange) Died(Person, Date) 12/1/15 23CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
24
Sample Advice for Collective Classification? What might we say to an ML working on KBP? Think about constraints across the relations People are only married to one person at a time. People usually have fewer than five children and rarely more than ten. Typically one graduates from college in their 20’s. Most people only have one job at a time. One cannot go to college before they were born or after they died. Almost always your children are born after you were. People tend to marry people of about the same age. People rarely live to be over 100 years & never over 125. People don’t marry their children. … 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 24
25
Sample Advice for Collective Classification? What might we say to an ML working on KBP? Think about constraints across the relations People are only married to one person at a time. People usually have fewer than five children and rarely more than ten. Typically one graduates from college in their 20’s. Most people only have one job at a time. One cannot go to college before they were born or after they died. Almost always your children are born after you were. People tend to marry people of about the same age. People rarely live to be over 100 years & never over 125. People don’t marry their children. … 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13 25 When converted to MLN notation, these sentences of common-sense knowledge improve the results of information-extraction algorithms that simply extract each relation independently (and noisily)
26
Scaling Up MLN Inference (see ICDM ‘12 paper by Niu et al. titled “Scaling Inference for Markov Logic via Dual Decomposition”)Scaling Inference for Markov Logic via Dual Decomposition We successfully ran in 1 day on the Knowledge Base Population task with –240 million facts (from 500 million web articles) –64 billion logic sentences in the ground MLN –5 terrabyte database (from GreenPlum, Inc) –256 GB RAM, 40 cores on 4 machines –See `DeepDive/Wisci’ at www.youtube.com/user/HazyResearch/videos www.youtube.com/user/HazyResearch/videos Slide 2612/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 13
27
Learning MLNs Like with Bayes Nets, need to learn –Structure (ie, a rule set; could be given by user) –Weights (can use gradient descent) –There is a small literature on these tasks (some by my group) 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 1327
28
MLN Challenges Estimating probabilities (‘inference’) can be cpu-intensive usually need to use clever sampling methods since # of world states is 0 (2 N ) Interesting direction: lifted inference (reason at first-order level, rather than on grounded network) Structure learning and refinement is a major challenge 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 1328
29
MLN Wrapup Appealing combo of first-order logic and prob/stats (the two primary math underpinnings of AI) Impressive results on real-world tasks Appealing approach to ‘knowledge refinement’ 1. Humans write (buggy) common-sense rules 2. MLN algo learns weights (and maybe ‘edits’ rules) Computationally demanding (both learning MLNs and using them to answer queries) Other approaches to probabilistic logic exist; vibrant/exciting research area 12/1/15CS 540 - Fall 2015 (Shavlik©), Lecture 27, Week 1329
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.