Enriching Structured Knowledge with Open Information
Outline Introduction Related Work Overview Clustering & Mapping Experiments
Introduction State-0f-the-art IE systems Differences Focus NELL, REVERB, OLLIE, … (OIE) YAGO, DBPEDIA, FREEBASE, … (KB) Differences OIE: no fixed schemata, unstructured KB: assertions, URI, ontology Focus Mapping(OIE -> ontology) Results: precise & unambiguous assertions
Introduction Scenario Example Domain unlimited Applicable to NL: rel(s, o) Example REVERB fact input: is a town in (Croydon, London) Target KB: DBPEDIA Mappings is a town in - > dbo:country Croydon -> db:Croydon, London -> db:London Assertion output: dbo:county(db:Croydon, db:London)
Introduction Proplems Strategy Contributions Polysemous, Ambiguity Multi-references Strategy Relation phrases clustering Contributions Modularised mapping workflow: OIE -> KB Markov clustering Feedback: improve overall results’ quality
Related Work Matching Instances Knowledge Base Constructions & Debugging Distant Supervision based Approaches Semantifying Open Information
Overview Framework
Overview Modules instance matching (IM) look up (LU) clustering (CL) property mapping (PM)
Module Description IM Input: OIE facts Output: mapping[s & o terms -> Dbpedia entities] Working mechanism(Disambiguation) OIE Instances Possible referred entities(in KB Dbpedia) Candidate matching: probabilistic ranking based on KB relation pattern (domain/range) Filtering(MAP state)
Module Description LU Process Function: search for facts in target KB Input: set of instance mappings from IM Process OIE fact f, subject x, object y Search for KB assertions relating x & y Judge f+: facts with KB assertions(as PM mapping evidences) f-: facts without KB assertions(translated to Dbpedia vocabulary, KB extension)
Module Description CL Input: OIE facts 3 different clusters Output wf1: Trivial, a relational phrase=one element cluster wf2: Non-trivial, without Dbpedia seeds(properties) wf3: Non-trivial, with Dbpedia seeds Output wf1: clusters of similar relational phrases wf2,wf3: clusters(forward to IM)
Module Description PM Aim: map[relation phrase (OIE properties)-> object property(KB properties)] Mechanism Association rule(frequent rule pattern) mining rel -> (domain, range) Input: f+ Evidences for association rules formation Evidences for possible mapping Output: set of property mappings
Clustering & Mapping Similarity Metrics jac(), jaccard similarity wn(), WordNet similarity β, weighing factor, β∈[0, 1]
Clustering & Mapping Markov Clustering Node: rel-phrase Edge: affinity score Transition probability Mechanism Random walk(markov) Iterate to steady state probability distribution Strong link stronger Weak link weaker
Clustering & Mapping Markov Clustering Inflation Choice of I Parameter inflation factor, I Choice of I Set too small, cluster coarse, vice fersa Reasonable I, some final cluster had sub-clusters
Clustering & Mapping PM Strategies Pairwise similarity wf1-REVERB rel-phrases one element cluster, map to Dbpedia property wf2-extension of wf1, cluster REVERB rel-ph wf3-add Dbpedia properties as seeds clustered with REVERB rel-phrases, use markov clustering Pairwise similarity
Experiments Dataset REVERB ClubWeb Extractions confidence score >= 95%, remove facts with numeric expressions: 3.5 million triples with 474325 rel-phs 500 most frequent REVERB properties 100 most frequent Dbpedia properties
Experiments-Evaluation Metric S: cluster score comp(ci) : intra-cluster sparseness iso(C): inter-cluster sparseness The comp(ci) higher, iso(C) lower, the better.
Experiments-Evaluation Analysis Control parameter
Experiments-Evaluation
THANKS!