Download presentation
Presentation is loading. Please wait.
Published byJason Thompson Modified over 9 years ago
1
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented at the 2 nd Workshop on Ontology Learning and Population (OLP2) Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany)
2
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 2 Overview ر Motivation ر The LEILA System ر Plan of Attack ر System Architecture ر Experiments ر Conclusion
3
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 3 Motivation Meat dish Google SearchI'm feeling hungry This page has been created to enlighten the public about the Wiener Schnitzel. [...] ?
4
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 4 Motivation To know that a Schnitzel is a meat dish, we need an ontology. ر Use hand-crafted ontologies (like WordNet) (but: low coverage, high cost, fast aging) ر Or: Gather ontological data from Web documents
5
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 5 Goal Given ر a binary target relation (e.g. subclassOf ) ر a set of Web documents extract all pairs of entities that are in the target relation
6
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 6 Related Work X is a Y A Schnitzel is a meat dish from Austria. Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)
7
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 7 Related Work X is a Y A Schnitzel, also called Wiener Schnitzel, is a meat dish. Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)
8
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 8 Related Work ┌──────Subject───────────┐┌Obj─┐ A Schnitzel, also called Wiener Schnitzel, is a meat dish. Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) Idea: Learn linguistic patterns!
9
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 9 Plan of Attack (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation)
10
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 10 Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation) The Schnitzel (0.0314946089 stones) is best enjoyed with Ösibräu.
11
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 11 Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation) The Schnitzel (200g) is best enjoyed with Ösibräu.
12
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 12 The Schnitzel (200g) is best enjoyed with Oesibraeu. Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation)
13
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 13 Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation) The Schnitzel is best enjoyed with Oesibraeu. The Schnitzel ( 200 g )
14
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 14 Preprocessing Schnitzelmeat dish Koalamammal … subclassOf detsubj participle adv modcomp The Schnitzel ( 200 g ) adj adj adj adj adj The Schnitzel is best enjoyed with Oesibraeu.
15
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 15 Preprocessing Schnitzelmeat dish Koalamammal … subclassOf (Web documents) (Output pairs) (Target relation)
16
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 16 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … dogmammal... A dog is a mammal. dognag... + -
17
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 17 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … (Positive patterns) dognag... dogmammal... + - This dog is a nag.A X is a Y.
18
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 18 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … (Positive patterns)(Negative patterns) A X is a Y.This X is a Y. dognag... dogmammal... + -
19
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 19 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … (Generalized positive patterns) A X is a Y. dogmammal... dognag... + - A Schnitzel is a meat dish.
20
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 20 LEILA: System Architecture (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … dogmammal... dognag... Seed pair data sets LEILA LinkParser (Sleator, CMU) Preprocessing, stemming kNN Learner SVMLight (Joachims, Cornell U)
21
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 21 Gold Standard for Evaluation (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … (Target relation) (Ideal pairs) A Schnitzel is practically vitamin-free and thus the meat dish is extremely popular in Europe. Schnitzel meat dish
22
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 22 Results with different relations Seed pairs are given by a function that decides whether a word pair is ر an example (here: list of birth dates from www.famousbirthdays.com) ر a counterexample (here: can be deduced from examples) ر a candidate (here: all pairs of a name and a date) birthDate
23
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 23 Results with different relations birthDate Patterns: X (born in Y) X was born in Y... 79% 8% 70% 9% Target Relation CorpusPrecision Recall Wikip composers (see paper for details on the experiments)
24
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 24 Results with different relations synonymy Examples: all WordNet synsets Counterexamples: all words that are not in a synset Candidates: all pairs of proper names Patterns: X or Y, X (or Y),... 73% 7% 64% 7% birthDate 79% 8% 70% 9% Target Relation CorpusPrecision Recall Wikip composers Wikip geography
25
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 25 Results with different relations Examples: all direct WordNet hyponyms Counterexamples: all words that are not hyponyms of each other Candidates: all pairs of a proper name and a WordNet concept Patterns: an X is a Y, X is unusual among the Y,... instanceOf 58% 3% 41% 3% synonymy 73% 7% 64% 7% birthDate 79% 8% 70% 9% Target Relation CorpusPrecision Recall Wikip composers Wikip geography Wikip composers
26
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 26 Results with different relations instanceOf 58% 3% 41% 3% synonymy 73% 7% 64% 7% birthDate 79% 8% 70% 9% Target Relation CorpusPrecision Recall Wikip composers Wikip geography Wikip composers Wikip random Google composers 28% 3% 17% 2% 33% 3% (see paper for details on the experiments)
27
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 27 58 41 Results with different competitors (see paper for explanations, conditions and details!) Snowball headquarters Snowball’s corpus TextToOnto,Text2Onto instanceOf Wikip composers CV-System instanceOf CV’s corpus CV-System instanceOf Wikip composers 34 90 50 30 58 41 50 39 4 32 26 15 32 22 4 (Results in %, LEILA in red) 2 Precision Recall
28
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 28 Conclusion Our system LEILA ر can learn arbitrary binary relations from Web documents ر uses a deep linguistic analysis ر compares favorably with other systems See http://www.mpi-inf.de/~suchanek
29
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 29 Results with different competitors headquarters instanceOf 34% 8% 30% 7% System Relation Corpus Precision Recall SnowballSnowball’s headquarters 90% 6% 50% 7% LEILA Snowball’s TextToOnto Wikip composers 39% 9% 4% 1% Text2Onto instanceOf Wikip composers 50% 2% 1% CV-System instanceOf CV’s 32% 5% LEILA instanceOf CV’s 26% 7% 15% 4% CV-System instanceOf 22% 4% 2% Wikip composers LEILA instanceOf Wikip composers 58% 3% 41% 3% (see paper for explanations, conditions and details!) LEILA instanceOf Wikip composers 58% 3% 41% 3%
30
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 30 Pattern Generalization – kNN This X is a Y. X such as Y A X is a Y. + + - A X is a big Y (See our paper at KDD for details)
31
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 31 Pattern Generalization – SVM This X is a Y. X such as Y A X is a Y. + + - A X is a big Y (See our paper at KDD for details) - + +
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.