1 Statistical Predicate Invention Stanley Kok Dept. of Computer Science and Eng. University of Washington Joint work with Pedro Domingos.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
1 Constraint Satisfaction Problems A Quick Overview (based on AIMA book slides)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Jointly Identifying Temporal Relations with Markov Logic Katsumasa Yoshikawa †, Sebastian Riedel ‡, Masayuki Asahara †, Yuji Matsumoto † † Nara Institute.
CPSC 322, Lecture 30Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30 March, 25, 2015 Slide source: from Pedro Domingos UW.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Markov Logic Networks: Exploring their Application to Social Network Analysis Parag Singla Dept. of Computer Science and Engineering Indian Institute of.
1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
Speaker:Benedict Fehringer Seminar:Probabilistic Models for Information Extraction by Dr. Martin Theobald and Maximilian Dylla Based on Richards, M., and.
School of Computing Science Simon Fraser University Vancouver, Canada.
Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Structure Learning. Overview Structure learning Predicate invention Transfer learning.
Recursive Random Fields Daniel Lowd University of Washington June 29th, 2006 (Joint work with Pedro Domingos)
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Relational Models. CSE 515 in One Slide We will learn to: Put probability distributions on everything Learn them from data Do inference with them.
Markov Logic Networks: A Unified Approach To Language Processing Pedro Domingos Dept. of Computer Science & Eng. University of Washington Joint work with.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University.
Learning, Logic, and Probability: A Unified View Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Stanley Kok, Matt.
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Boosting Markov Logic Networks
Ryan Chu. Arithmetic Expressions Arithmetic expressions consist of operators, operands, parentheses, and function calls. The purpose is to specify an.
Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Markov Logic And other SRL Approaches
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
1 Extracting Semantic Networks From Text Via Relational Clustering Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA.
Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)
Predicate Calculus Syntax Countable set of predicate symbols, each with specified arity  0. For example: clinical data with multiple tables of patient.
Randomized Algorithms for Bayesian Hierarchical Clustering
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
DATA CLUSTERING WITH KERNAL K-MEANS++ PROJECT OBJECTIVES o PROJECT GOAL  Experimentally demonstrate the application of Kernel K-Means to non-linearly.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
First-Order Logic and Inductive Logic Programming.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Frank DiMaio and Jude Shavlik Computer Sciences Department
First-Order Logic and Inductive Logic Programming
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli
Lecture 5 Binary Operation Boolean Logic. Binary Operations Addition Subtraction Multiplication Division.
Dan Roth Department of Computer Science
Presentation transcript:

1 Statistical Predicate Invention Stanley Kok Dept. of Computer Science and Eng. University of Washington Joint work with Pedro Domingos

2 Overview Motivation Background Multiple Relational Clusterings Experiments Future Work

3 Motivation Statistical Learning able to handle noisy data Relational Learning (ILP) able to handle non-i.i.d. data Statistical Relational Learning

4 Latent Variable Discovery [Elidan & Friedman, 2005; Elidan et al.,2001; etc.] Predicate Invention [Wogulis & Langley, 1989; Muggleton & Buntine, 1988; etc.] Motivation Statistical Learning able to handle noisy data Relational Learning (ILP) able to handle non-i.i.d. data Statistical Relational Learning Discovery of new concepts, properties, and relations from data Statistical Predicate Invention

5 SPI Benefits More compact and comprehensible models Improve accuracy by representing unobserved aspects of domain Model more complex phenomena

6 State of the Art Few approaches combine statistical and relational learning Only cluster objects [Roy et al., 2006; Long et al., 2005; Xu et al., 2005; Neville & Jensen, 2005; Popescul & Ungar 2004; etc.] Only predict single target predicate [Davis et al., 2007; Craven & Slattery, 2001] Infinite Relational Model [Kemp et al., 2006; Xu et al., 2006] Clusters objects and relations simultaneously Multiple types of objects Relations can be of any arity #Clusters need not be specified in advance

7 Multiple Relational Clusterings Clusters objects and relations simultaneously Multiple types of objects Relations can be of any arity #Clusters need not be specified in advance Learns multiple cross-cutting clusterings Finite second-order Markov logic First step towards general framework for SPI

8 Overview Motivation Background Multiple Relational Clusterings Experiments Future Work

9 Markov Logic Networks (MLNs) A logical KB is a set of hard constraints on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, it becomes less probable, not impossible Give each formula a weight (Higher weight  Stronger constraint)

10 Markov Logic Networks (MLNs) Vector of truth assignments to ground atoms Partition function. Sums over all possible truth assignments to ground atoms Weight of i th formula #true groundings of i th formula

11 Overview Motivation Background Multiple Relational Clusterings Experiments Future Work

12 Multiple Relational Clusterings Invent unary predicate = Cluster Multiple cross-cutting clusterings Cluster relations by objects they relate and vice versa Cluster objects of same type Cluster relations with same arity and argument types

13 Example of Multiple Clusterings Bob Bill Alice Anna Carol Cathy Eddie Elise David Darren Felix Faye Hal Hebe Gerald Gigi Ida Iris Friends Predictive of hobbies Co-workers Predictive of skills Some are friends Some are co-workers

14 Second-Order Markov Logic Finite, function-free Variables range over relations (predicates) and objects (constants) Ground atoms with all possible predicate symbols and constant symbols Represent some models more compactly than first-order Markov logic Specify how predicate symbols are clustered

15 Symbols Cluster: Clustering: Atom:, Cluster combination:

16 MRC Rules Each symbol belongs to at least one cluster Symbol cannot belong to >1 cluster in same clustering Each atom appears in exactly one combination of clusters

17 MRC Rules Atom prediction rule: Truth value of atom is determined by cluster combination it belongs to Exponential prior on number of clusters

18 Learning MRC Model Learning consists of finding Cluster assignment  assignment of truth values to all and atoms Weights of atom prediction rules Vector of truth assignments to all observed ground atoms that maximize log-posterior probability

19 Learning MRC Model Three hard rules + Exponential prior rule

20 Learning MRC Model Atom prediction rules Smoothing parameter Wt of rule is log-odds of atom in its cluster combination being true Can be computed in closed form #true & #false atoms in cluster combination

21 Search Algorithm Approximation: Hard assignment of symbols to clusters Greedy with restarts Top-down divisive refinement algorithm Two levels Top-level finds clusterings Bottom-level finds clusters

22 P Q R S T W Search Algorithm V U P Q R S T W a b cd h g fe Inputs: sets of predicate symbols constant symbols Greedy search with restarts Outputs: Clustering of each set of symbols

23 P Q R S T W V U P Q R S T W a b cd h g fe predicate symbols constant symbols Greedy search with restarts Outputs: Clustering of each set of symbols P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe Recurse for every cluster combination Search Algorithm Inputs: sets of

24 P Q R S T W V U P Q R S T W a b cd h g fe P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe Recurse for every cluster combination P Q R S T W V U P Q R S T W a b cd h g fe P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe P Q R S a b cd P Q R S P Q R S a b cd a b cd Search Algorithm h g fe Q R P S Q R P S h g fe h g fe Terminate when no refinement improves MAP score predicate symbols constant symbols Inputs: sets of

25 P Q R S T W V U P Q R S T W a b cd h g fe P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe P Q R S T W V U P Q R S T W a b cd h g fe P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe P Q R S a b cd P Q R S P Q R S a b cd a b cd Search Algorithm h g fe Q R P S Q R P S h g fe h g fe Leaf ≡ atom prediction rule Return leaves 8 r, x r 2  r Æ x 2  x ) r(x)

26 P Q R S T W V U P Q R S T W a b cd h g fe P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe P Q R S T W V U P Q R S T W a b cd h g fe P Q R S V U T W P Q R S V U T W a b cd h g fe a b cd h g fe P Q R S a b cd P Q R S P Q R S a b cd a b cd Search Algorithm h g fe Q R P S Q R P S h g fe h g fe : Multiple clusterings Search enforces hard rules Limitation: High-level clusters constrain lower ones

27 Overview Motivation Background Multiple Relational Clusterings Experiments Future Work

28 Datasets Animals Sets of animals and their features, e.g., Fast(Leopard) 50 animals, 85 features 4250 ground atoms; 1562 true ones Unified Medical Language System (UMLS) Biomedical ontology Binary predicates, e.g., Treats(Antibiotic,Disease) 49 relations, 135 concepts 893,025 ground atoms; 6529 true ones

29 Datasets Kinship Kinship relations between members of an Australian tribe: Kinship(Person,Person) 26 kinship terms, 104 persons 281,216 ground atoms; 10,686 true ones Nations Set of relations among nations, e.g.,ExportsTo(USA,Canada) Set of nation features, e.g., Monarchy(UK) 14 nations, 56 relations, 111 features 12,530 ground atoms; 2565 true ones

30 Methodology Randomly divided ground atoms into ten folds 10-fold cross validation Evaluation measures Average conditional log-likelihood of test ground atoms (CLL) Area under precision-recall curve of test ground atoms (AUC)

31 Methodology Compared with IRM [Kemp et al., 2006] and MLN structure learning (MSL) [Kok & Domingos, 2005] Used default IRM parameters; run for 10 hrs MRC parameters and  both set to 1 (no tuning) MRC run for 10 hrs for first level of clustering MRC subsequent levels permitted 100 steps (3-10 mins) MSL run for 24 hours; parameter settings in online appendix

32 Results CLL AUC AnimalsUMLSKinshipNations IRMMRC MSLInitIRMMRCMSLInitIRMMRCMSLInit IRMMRCMSLInit Animals UMLSKinship Nations IRMMRC MSLInit IRMMRCMSLInitIRMMRCMSLInitIRMMRCMSLInit

33 Multiple Clusterings Learned Virus Fungus Bacterium Rickettsia Invertebrate Alga Plant Archaeon Amphibian Bird Fish Human Mammal Reptile Vertebrate Animal Bioactive Substance Biogenic Amine Immunologic Factor Receptor Found In ¬ Found In Disease Cell Dysfunction Neoplastic Process Causes ¬ Causes

34 Multiple Clusterings Learned Virus Fungus Bacterium Rickettsia Invertebrate Alga Plant Archaeon Amphibian Bird Fish Human Mammal Reptile Vertebrate Animal Is A ¬ Is A Disease Cell Dysfunction Neoplastic Process Causes ¬ Causes

35 Multiple Clusterings Learned Virus Fungus Bacterium Rickettsia Invertebrate Alga Plant Archaeon Amphibian Bird Fish Human Mammal Reptile Vertebrate Animal Bioactive Substance Biogenic Amine Immunologic Factor Receptor Found In ¬ Found In Disease Cell Dysfunction Neoplastic Process Causes ¬ Causes Is A ¬ Is A

36 Overview Motivation Background Multiple Relational Clusterings Experiments Future Work

37 Future Work Experiment on larger datasets, e.g., ontology induction from web text Use clusters learned as primitives in structure learning Learn a hierarchy of multiple clusterings and performing shrinkage Cluster predicates with different arities and argument types Speculation: all relational structure learning can be accomplished with SPI alone

38 Conclusion Statistical Predicate Invention: key problem for statistical relational learning Multiple Relational Clusterings First step towards general framework for SPI Based on finite second-order Markov logic Creates multiple relational clusterings of the symbols in data Empirical comparison with MLN structure learning and IRM shows promise