Download presentation
Presentation is loading. Please wait.
Published byMalcolm Lester Modified over 9 years ago
1
Practical Probabilistic Relational Learning Sriraam Natarajan
2
Take-Away Message Learn from rich, highly structured data!
3
Traditional Learning + DataAttributes(Features) Data is i.i.d. BEAMJ 10110 00001... 01101 Earthquake Alarm Burglary MaryCalls JohnCalls
4
Learning Earthquake Alarm Burglary MaryCalls JohnCalls 0.080.92 0.010.99 0.10.9 0.550.45 0.60.4 0.950.05 0.30.7 0.80.2 0.10.9 0.1
5
PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA Real-World Problem: Predicting Adverse Drug Reactions PatientID Gender Birthdate P1 M 3/22/63 PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientID Date Lab Test Result P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 Patient Table Visit Table Lab Tests SNP Table Prescriptions
6
Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Logic Probabilities Add Probabilities Statistical Relational Learning (SRL) Several previous SRL Workshops in the past decade This year – StaRAI @ AAAI 2013 Add Relations
7
Propositional Logic First Order Logic Statistical Relational Learning Probability Theory Probabilistic Logic Inductive Logic Programming Classical Machine Learning Prop Rule Learning Deterministic Stochastic Learning No Learning PropFO
8
Costs and Benefits of the SRL soup Benefits Rich pool of different languages Very likely that there is a language that fits your task at hand well A lot research remains to be done, ;-) Costs “Learning” SRL is much harder Not all frameworks support all kinds of inference and learning settings How do we actually learn relational models from data?
9
Why is this problem hard? Non-convex problem Repeated search of parameters for every step in induction of the model First-order logic allows for different levels of generalization Repeated inference for every step of parameter learning Inference is P# complete How can we scale this?
10
Relational Probability Trees Each conditional probability distribution can be learned as a tree Leaves are probabilities The final model is the set of the RRTs male(X) chol(X,Y,L), Y>40,L>200 diag(X,Hypertension,Z),Z>55 bmi(X,W,55), W>30 0.8 0.77 0.05 0.3 no yes no yes no yes [Blockeel & De Raedt ’98] To predict heartAttack(X) …
11
Probability of an example Weight learning Gradient of log-likelihood w.r.t w = Δ i Sum all gradients to get final w Several gradient-based approaches in SRL Learning Problem #1 : Parameter Learning Logistic Regression Singla & Domingos AAAI’05, Jaeger ICML ’07, Natarajan et al. ICML’05, AMAI’08
12
Learning Problem #2: Structure Learning Large space of possible structures Typical approaches: Use ILP techniques to learn the structure followed by parameter learning Kersting and De Raedt’02 Learn parameters for every candidate structure May not have closed form solution for parameter learning Kok and Domingos ICML‘05 12
13
Probability of an example Functional gradient – Gradient of log-likelihood w.r.t (x) – Sum all gradients to get final (x) Functional Gradients xΔ a1a2a3 0.7 b1b2b3 -0.2 c1c2c3 -0.9 J. Friedman, Annals of Statistics’01
14
Gradient (Tree) Boosting [Friedman Annals of Statistics 29(5):1189-1232, 2001] Models = weighted combination of a large number of small trees (models) Intuition: Generate an additive model by sequentially fitting small trees to pseudo-residuals from a regression at each iteration… Data Predictions - Residuals = Data + Loss fct Initial Model + + + Induce Iterate Final Model = + + + + …
15
Boosting Results – MLJ 11 AlgoLikelihoodAUC-ROCAUC-PRTime Boosting0.8100.9610.9309s MLN0.7300.5350.62193 hrs Predicting the advisor for a student Movie Recommendation Citation AnalysisMachine Reading
16
Other Applications Similar Results in several other problems Imitation Learning – Learning how to act from demonstrations (Natarajan et al IJCAI ‘11) Robocup, a grid world domain, traffic signal domain and blocksworld Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13) Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12) Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)
17
Parallel Lifted Learning
18
Stochastic ML Statistical Relational Scales well, stochastic gradients, online learning, … Symmetries, compact models, lifted inference, …. Parallel Symmetries, compact models, lifted inference, ….
19
Symmetry based inference
20
1 3 5 423 2 1 4 5 1 3 5 4 2 1 3 5 42 P(Anna)HI (Bob) P(Bob) HI(Anna) root clause P(Anna) !P(Bob) neighboring clauses P(Anna) => !HI(Bob) P(Anna) => HI(Anna) P(Bob) => HI(Bob) P(Bob) => !HI(Anna) Tree (set of clauses) P(Anna) !P(Bob) P(Bob)=> HI(Bob) P(Bob)=> !HI(Anna) Variabilized tree P(X) !P(Y) P(Y)=> HI(Y) P(Y)=> !HI(X)
21
Lifted Training Generate tree pieces from corresponding patterns. Compute gradient using lifted BP Update covariance matrix C or some low rank variant Update parameter vector and the corresponding equations Randomly draw mini-batches Generate initial tree pieces and variablize its arguments.
22
Challenges Message schedules Iterative Map-reduce? How do we take this idea to learning the models? How can we more efficiently parallelize symmetry identification? What are the compelling problems? Vision, NLP,…
23
Conclusion The world is inherently relational and uncertain SRL has developed into an exciting field in the past decade Several previous SRL workshops Boosting Relational models has promising initial results Applied to several different problems First scalable relational learning algorithm How can we parallelize/scale this algorithm? Can this benefit from an inference algorithm like Belief Propagation that can be parallelized easily?
24
Future Work Develop Lifted Online Structure Learning Integrate ideas from DB Exploit relational logic on DB and implement lifted inference techniques on DB Real-world applications of FGB Activity Recognition, Localization, Natural Language Processing, Bio-Medical Applications Predictive Personalized Medicine Mining information from large-scale medical databases Use text from the web (blogs) and combine the learned models with the clinical data Learning from expert Evaluate in several domains such as Wargus, Robocup
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.