Download presentation
Presentation is loading. Please wait.
Published byKelly Johnson Modified over 6 years ago
1
Thirty-Three Years of Knowledge-Based Machine Learning
Jude Shavlik University of Wisconsin USA
2
Key Question of AI: How to Get Knowledge into Computers?
Hand coding Supervised ML How can we mix these two extremes? Small ML subcommunity has looked at ways to do so
3
The Standard Task for Supervised Machine Learning
Given Training Examples Features Age Weight Height BP Sex . . . Outcome 38 142 64in 150/88 F 27 189 70in 125/72 M Examples Produce a ‘Model’ (that Classifies Future Examples)
4
Learning with Data in Multiple Tables (Relational ML)
Previous Blood Tests Patients Previous Rx Previous Mammograms Key challenge different amount of data for each patient
5
How Does the ‘Domain Expert’ Impart His/Her Expertise?
Choose Features Choose and Label Examples Can AI’ers Allow More? Knowledge-Based Artificial Neural Networks Knowledge-Based Support Vector Machines Markov Logic Networks (Domingos’ group)
6
Two Underexplored Questions in ML
How can we go beyond teaching machines solely via I/O pairs? ‘advice giving’ How can we understand what an ML algorithm has discovered? ‘rule extraction’
7
Outline Explanation-Based Learning (1980’s)
Knowledge-Based Neural Nets (1990’s) Knowledge-Based SVMs (2000’s) Markov Logic Networks (2010’s)
8
Explanation-Based Learning (EBL) – My PhD Years, 1983-1987
The EBL Hypothesis By understanding why an example is a member of a concept, can learn essential properties of the concept Trade-Off The need to collect many examples for The ability to ‘explain’ single examples (a ‘domain theory’) Ie, assume a smarter learner
9
Knowledge-Based Artificial Neural Networks, KBANN (1988-2001)
Initial Symbolic Domain Theory Final Symbolic Domain Theory Mooney, Pazzani, Cohen, etc Extract Examples Insert Initial Neural Network Trained Neural network Refine
10
What Inspired KBANN? Geoff Hinton was an invited speaker at ICML-88
I recall him saying something like “one can backprop through any function” And I thought “what would it mean to backprop through a domain theory?”
11
Inserting Prior Knowledge into a Neural Network
Domain Theory Final Conclusions Supporting Facts Intermediate Conclusions Neural Network Output Units Input Units Hidden Units
12
Jumping Ahead a Bit Notice that symbolic knowledge induces a graphical model, which is then numerically optimized Similar perspective later followed in Markov Logic Networks (MLNs) However in MLNs, symbolic knowledge expressed in first-order logic
13
Mapping Rules to Nets Z A B C D If A and B then Z If B and ¬C then Z
Maps propositional rule sets to neural networks Bias Z A B C D 4 2 -4 6 Weight
14
Case Study: Learning to Recognize Genes (Towell, Shavlik & Noordewier, AAAI-90)
promoter :- contact, conformation. contact :- minus_35, minus_10. <4 rules for conformation> <4 rules for minus_35> <4 rules for minus_10> (Halved error rate of standard BP) promoter contact conformation minus_35 minus_10 We compile rules to a more basic language, but here we compile for refinability DNA sequence
15
Learning Curves (similar results on many tasks and with other advice-taking algo’s)
Fixed amount of data Testset Errors Amount of Training Data KBANN STD. ANN DOMAIN THEORY Given error-rate spec
16
From Prior Knowledge to Advice (Maclin PhD 1995)
Originally ‘theory refinement’ community assumed domain knowledge was available before learning starts (prior knowledge) When applying KBANN to reinforcement learning, we began to realize you should be able to provide domain knowledge to a machine learner whenever you think of something to say Changing the metaphor: commanding vs. advising computers Continual (ie, lifelong) Human Teacher – Machine Learner Cooperation
17
What Would You Like to Say to This Penguin?
IF a Bee is (Near and West) & an Ice is (Near and North) Then Begin Move East Move North END
18
Some Advice for Soccer-Playing Robots
if distanceToGoal ≤ 10 and shotAngle 30 then prefer shoot over all other actions X
19
Some Sample Results With advice Without advice
20
Overcoming Bad Advice No Advice Bad Advice
21
Rule Extraction Initially Geoff Towell (PhD, 1991) viewed this as simplifying the trained neural network (M-of-N rules) Mark Craven (PhD, 1996) realized This is simply another learning task! Ie, learn what the neural network computes Collect I/O pairs from trained neural network Give them to decision-tree learner Applies to SVMs, decision forests, etc
22
Understanding What a Trained ANN has Learned - Human ‘Readability’ of Trained ANNs is Challenging
Rule Extraction (Craven & Shavlik, 1996) Could be an ENSEMBLE of models Roughly speaking, train ID3 to learn the I/O behavior of the neural network – note we can generate as many labeled training ex’s as desired by forward prop’ing through trained ANN!
23
KBANN Recap Use symbolic knowledge to make an initial guess at the concept description Standard neural-net approaches make a random guess Use training examples to refine the initial guess (‘early stopping’ reduces overfitting) Nicely maps to incremental (aka online) machine learning Valuable to show user the learned model expressed in symbols rather than numbers
24
Knowledge-Based Support Vector Machines (2001-2011)
Question arose during PhD defense of Tina Eliassi-Rad How would you apply the KBANN idea using SVMs? Led to collaboration with Olvi Mangasarian (who has worked on SVMs for over 50 years!)
25
Generalizing the Idea of a Training Example for SVM’s
Can extend SVM linear program to handle ‘regions as training examples’ In standard supervised learning, training data can be viewed as points in a “feature space.” In the work at Wisconsin, the idea of a “training example” has been extended to be REGIONS in feature space (with “points” being a limiting case). Notice that an IF-THEN rule defines a region in feature space (eg, If age is between 10 and 10 and height is between 50 and 60 inches, then category is POSITIVE”). This provides a mechanism for domain knowledge to be used in combination with (traditional) training examples.
26
Knowledge-Based Support Vector Regression
Add soft constraints to linear program (so need only follow advice approximately) 4 Output Advice: In this region, y should exceed 4 Inputs Note that with advice get a different line that one would get by simply “curve fitting” the data points. (A simple line is used here, but f(x) could be a non-linear function as well.) The notation ||.||1 is the “one norm” and is simply the sum of the absolute values. The first term in the expression being minimized is the “model complexity” and the second is the “data misfit.” minimize ||w||1 + C ||s||1 + penalty for violating advice such that f(x) = y s constraints that represent advice
27
Sample Advice-Taking Results
if distanceToGoal ≤ 10 and shotAngle 30 then prefer shoot over all other actions advice Q(shoot) > Q(pass) Q(shoot) > Q(move) 2 vs 1 BreakAway, rewards +1, -1 std RL Maclin et al: AAAI ‘05, ‘06, ‘07
28
Automatically Creating Advice
Interesting approach to transfer learning (Lisa Torrey, PhD 2009) Learn in task A Perform ‘rule extraction’ Give as advice for related task B Since advice not assumed 100% correct, differences between tasks A and B handled by training ex’s for task B So advice giving is done by MACHINE!
29
Close-Transfer Scenarios
2-on-1 BreakAway 4-on-3 BreakAway 3-on-2 BreakAway
30
Distant-Transfer Scenarios
3-on-2 KeepAway 3-on-2 BreakAway 3-on-2 MoveDownfield
31
Some Results: Transfer to 3-on-2 BreakAway
Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006
32
KBSVM Recap Can view symbolic knowledge as a way to label regions of feature space (rather than solely labeling points) Maximize Model Simplicity + Fit to Advice + Fit to Training Examples Note: does not fit view of “guess initial model, then refine using training ex’s”
33
Markov Logic Networks, 2009+ (and statistical-relational learning in general)
My current favorite for combining symbolic knowledge and numeric learning MLN = set of weighted FOPC sentences wgt=3.2 x,y,z parent(x, y) parent(z, y) → married(x, z) Have worked on speeding up MLN inference (via RDBMS) plus learning MLN rule sets
34
Logic & Probability: Two Major Math Underpinnings of AI
Add Probabilities Probabilistic Logic Learning Add Relations Also called as SRL Inference is in inner loop Probabilities MLNs a popular approach
35
Learning a Set of First-Order Regression Trees (each path to a leaf is an MLN rule) – ICDM ‘11
Data = Gradients Current Rules learn vs Predicted Probs iterate + + Final Ruleset = + …
36
Relational Probability Trees
Blockeel & De Raedt , AIj 1998; Neville et al, KDD 2003 Computing prob(getFine(?X) | evidence) job(?Y, politician) 0.01 0.05 0.17 0.99 0.98 no yes knows(?X,?Y) job(?X, politician) speed(?X) > 75 mph
37
Basic Idea of Boosted Relational Dependency Networks
Learn a relational regression tree 0 training example, collect error in its predicted probability Learn another first-order regression tree that fixes errors collected in Step 2 (eg, for neg ex 9, predict -0.13) Goto 2 Learns ‘structure’ and weights simultaneously … a(X,Y) d(Y) b(Y,Z) 0.13 0.14 0.74 0.35 True False e(Y,Z) c(X) -0.32 0.25 0.32 -0.27
38
Some Results advisedBy(x,y) AUC-PR Training Time MLN-BT 0.94 ± 0.06
18.4 sec MLN-BC 0.95 ± 0.05 33.3 sec Alch-D 0.31 ± 0.10 7.1 hrs Motif 0.43 ± 0.03 1.8 hrs LHL 0.42 ± 0.10 37.2 sec
39
Illustrative Task Given a Text Corpus of Sports Articles, Determine Who Won/Lost Each Week winner(TeamName, GameID) loser(TeamName, GameID) Evidence: we ran Stanford NLP Toolkit on the articles and produced lots of facts (nextWord, partOfSpeech, parseTree, etc) DARPA gave us a few hundred pos ex’s
40
Some Expert-Provided ‘Advice’
X ‘beat’ ⟶ winner(X, ArticleDate) X ‘was beaten’ ⟶ loser(X, ArticleDate) Home teams more likely to win. If you found a likely game winner, then if you see a different team mentioned nearby in the text, it is likely the game loser. If a team didn’t score any touchdowns, it probably lost. If ‘predict*’ in the sentence, ignore it.
42
One Great Rule (very high precision, but moderate recall)
Look, in short phrases, for TeamName1 * Number1 * TeamName2 * Number2 where Number1 > Number2 Infer with high confidence winningTeam(TeamName1, ArticleDate) losingTeam(TeamName2 , ArticleDate) winnerScore(TeamName1, Number1 , ArticleDate) loserScore(TeamName2, Number2 , ArticleDate) Rule matches none of DARPA-provided training examples!
43
Differences from KBANN
Rules involve logical variables During learning, we create new rules to correct errors in initial rules Recent followup: also refine initial rules (note that KBSVMs also do NOT refine rules, though we had one AAAI paper on that)
44
Wrapping Up Symbolic knowledge refined/extended by Neural networks
Support-vector machines MLN rule and weight learning Variety of views taken Make initial guess at concept, then refine weights Use advice to label a region in feature space Make initial guess at concept, then add wgt’ed rules Seeing what was learned – rule extraction Applications in genetics, cancer, machine reading, robot learning, etc
45
Some Suggestions Allow humans to continually observe learning and provide symbolic knowledge at any time Never assume symbolic knowledge is 100% correct Allow user to see what was learned in a symbolic representation to facilitate additional advice Put a graphic on every slide
46
Thanks for Your Time Questions?
Papers, Powerpoint presentations, and some software available on line pages.cs.wisc.edu/~shavlik/mlrg/publications.html hazy.cs.wisc.edu/hazy/tuffy/ pages.cs.wisc.edu/~tushar/rdnboost/i
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.