Download presentation
Presentation is loading. Please wait.
Published byAlexandra Ware Modified over 11 years ago
1
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum (Max-Planck-Institute for Informatics, Saarbrücken, Germany)
2
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 2 Ontologies Singer Country USA Entity bornInPlace type subclassOf Wikipedia DBpedia, YAGO, KYLIN,... Internet ? "Elvis died in England" birth-place: USA
3
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 3 Information Extraction England diedInPlace "Elvis died in England" Previous approaches: Espresso, DIPRE, LEILA, Snowball, TextRunner, Alice, and many more Goal: Extract ontological information from natural language documents died in, perished in, was killed in,... ر May deliver non-canonic relations England, UK, Great Britain,... ر May deliver non-canonic entities diedInPlace(Elvis,England) diedInPlace(Elvis,Germany) ر May deliver inconsistent facts
4
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 4 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. France diedInPlace If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace
5
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 5 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace
6
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 6 Pitfalls of Information Extraction Elvis died in England. Ontology Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist ?
7
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 7 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. "Elvis""England" diedInPlace Taxidophobist Reasoning Problem
8
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 8 Pitfalls of Information Extraction Elvis died in England. Web page Louis XIV died in France. If a pattern occurs with two entities that stand in a relation, then the pattern maps to the relation. "died in" = diedInPlace If a meaningful pattern occurs with two entities, then the entities stand in the relation. Taxidophobist Reasoning Problem Disambiguation Problem
9
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 9 Pitfalls of Information Extraction Elvis died in England. Louis XIV died in France. Taxidophobist Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ?
10
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 10 Information Extraction as Formulas type(Elvis,Taxidophobist). type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) [0.8] Taxidophobist Reasoning Problem
11
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 11 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist).
12
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 12 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Information Extraction as Formulas Disambiguation Problem
13
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 13 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] A word in context (wic). Here: The word "Elvis" in document D15 One possible meaning of "Elvis" as given by the ontology Prior estimation for the likelihood of this meaning. Information Extraction as Formulas | words(D15) rel(ElvisPresley)| | words(D15) |
14
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 14 Assumptions: ر In one document, the same word has always the same meaning ر The ontology already knows all important meanings of proper names possibleMeaning(Elvis@D15, ElvisPresley). [0.7] Information Extraction as Formulas possibleMeaning(X,Y) => means(X,Y) means(X,Y) & Y Z => means(X,Z)
15
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 15 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Reasoning Problem Disambiguation Problem Pattern Matching Problem "died in" = diedInPlace ? type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). meaning(Elvis@D15, ElvisPresley). [0.7]
16
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 16 Information Extraction as Formulas Elvis died in England. Louis XIV died in France. Pattern Matching Problem "died in" = diedInPlace ? occurs("died in", Elvis@D15, England@D15). [14] occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & mapsTo(P,R) => R(X,Y) occurs(P,Wic1,Wic2) & means(Wic1,X) & means(Wic2,Y) & R(X,Y) => mapsTo(P,R)
17
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 17 Information Extraction as Formulas Reasoning Problem Disambiguation Problem Pattern Matching Problem type(X,Taxidophobist) & bornInPlace(X,Y) => diedInPlace(X,Z) type(Elvis,Taxidophobist). meaning(Elvis@D15, ElvisPresley). [0.7] occurs("died in", Elvis@D15, England@D15). [14] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) ? mapsTo("died In", diedInPlace) ? diedIn(ElvisPresley, England) ?
18
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 18 Weighted MAX SAT Problem Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized Problems: ر The Weighted MAX SAT Problem is NP-hard ر Our instance of the problem is huge ر The most popular linear approximation algorithm (Johnson's) does not work well with our type of formulas Weighted MAX SAT Problem Johnson's cannot approximate better than 2/3 bornInPlace(X,Y) => bornInPlace(X,Z) A v B A v C B v C
19
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 19 A v B [w1] A v B [w2] B v C [w3] C [w4] Formulas ABCABC Hypotheses The Functional MAX SAT Algorithm considers only unit clauses. = true = false FMS Algorithm The Functional MAX SAT Algorithm propagates Dominating Unit Clauses A v B [10] A [10] A [30] A = true 30 > 10+10
20
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 20 FMS Algorithm Experiments show better performance in practice than Johnson's algorithm in our setting. FMS Algorithm FOR i=1 TO 42... NEXT i Approximation Guarantee Polynomial time
21
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 21 FMS Algorithm FOR i=1 TO 42... NEXT i FMS Algorithm Elvis died in Englandr(X,Y) & s(Y) => t(X,Y)
22
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 22 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO 42... NEXT i Elvis died in England type(Elvis,Taxidophobist)=1 diedIn(Elvis,England)=0 means(Elvis@D15,Elvis)=0 means(Elvis@D15,...)=1 r(X,Y) & s(Y) => t(X,Y)
23
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 23 England FMS Algorithm diedIn St. Elvis FMS Algorithm FOR i=1 TO 42... NEXT i r(X,Y) & s(Y) => t(X,Y)
24
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 24 CorpusType# DocsRelationsTimePrecision Wikipedia toy corpus structured10032min100% Wikipedia subcorpus semi- structured 20001515h94% News article toy corpus unstructured150124min91% Biographies from Web unstructured3440515h90% Other Experiments
25
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 25 SOFIE unifies the tasks of ر entity disambiguation ر pattern extraction ر semantic constraint reasoning in a single framework, delivering ر canonicalized facts ر of high precision (experiments show 90% precision) Conclusion died in England...but is alive!
26
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 26 occurs(P,WX,WY) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ R(X,Y) => expresses(P,R) occurs(P,WX,WY) /\ expressed(P,R) /\ refersTo(WX.X) /\ refersTo(WY,Y) /\ range(R,D1) /\ domain(R,D2) /\ type(X,D1) /\ type(Y,D2) => R(X,Y) R(X,Y) /\ R(X,Z) /\ type(R,function) => Y = Z disambiguationPrior(W,X) => refersTo(W,X) bornInYear(X,B) /\ diedInYear(X,D) => B<D SOFIE rules!
27
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 27 SOFIE: Experiments CorpusType# DocsRelationsTimePrecisionRecall Wikipedia toy corpus structured10038min100%80% Wikipedia toy corpus semi-structured 50% infoboxes removed 10038min100%57% Wikipedia subcorpus semi-structured20001515h94%? News article toy corpus unstructured150124min91%24%, 31% Snowball56%31% Biographies from Web unstructured3440515h90%?
28
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 28 SOFIE: Large-Scale Experiment Goal: Extract bornIn, bornOnDate, diedIn, diedOnDate, politicianOf Corpus: 3700 biography documents downloaded from the Web Runtime: (summed over 5 batches) Parsing7:05h Hypothesis Generation6:15h Solving2:30h Total15:50h Results: (precision in %) bornIn bornOnD diedIn diedOnD polOf 87 87 13 98 95 90
29
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 29 SOFIE: Relation to Markov Logic P bornIn(Nicholas, Patras) false true P(X) ~ e sat(i,X) wi Number of satisfied instances of the i th formula Weight of the i th formula r(x,y) /\ s(x,z) => t(x,z) [w]... max X e sat(i,X) wi max X log( e sat(i,X) wi ) max X sat(i,X) w i ~~~~> Weighted MAX SAT problem
30
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 30 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { r(a,a), s(a), t(a,a) } { r(a,b), s(b), t(a,b) } { r(b,a), s(a), t(b,a) } { r(b,b), s(b), t(b,b) } r(a,a) r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences) Entities={a,b}
31
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 31 Grounding r(X,Y) & s(Y) => t(X,Y) { r(X,Y), s(Y), t(X,Y) } { s(a), t(a,a) } [w] r(a,a) [w] r(a,b) r(b,a) r(b,b) Immutable, complete facts (e.g. pattern occurrences)
32
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 32 Grounding { s(a), t(a,a) } [w1] {p(c,d), q(e), } [w2] Find truth assignments to hypotheses so that the weight of satisfied formulas is maximized means(Elvis@D15, ElvisPresley) = true ? mapsTo("died In", diedInPlace) = true ? diedIn(ElvisPresley, England) = true ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.