Download presentation
Presentation is loading. Please wait.
Published byDonald Barnett Modified over 9 years ago
1
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar DFKI GmbH Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany
2
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Overview HLT and Ontology Engineering Automated Linguistic Analysis Ontology Learning from Text Further Issues: Evaluation Conclusions
3
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Ontology Lifecycle Creating Populating Validating Evolving Maintaining Deploying
4
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 HLT in the Ontology Lifecycle Ontology (Knowledge) Ontology Learning Development & Evolution Linguistic Analysis to Extract Classes / Relations Ontology Population Knowledge Base Generation Linguistic Analysis to Extract Instances Instances Documents (Text) HLT for Ontology Learning and Population from Text Human Language Technology = Automated Linguistic Analysis Classes, Relations/Properties
5
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Automated Linguistic Analysis
6
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. Dell computer flat screen motherboard has-a reject failure location-of animate-entity
7
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Levels of Linguistic Analysis Lexical Analysis Word Class: Part-of-Speech (also Semantic Class) Word Structure: Morphology Phrase Analysis Sentence Structure: Phrases (if ‘shallow’: Chunks ) Semantic Units Dependency Structure Analysis Sentence Meaning: Predicate Argument Structure (Clause) Semantic Structure
8
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Part-of-Speech, Morphology Part-of-Speech e.g.: noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags Morphology Most languages have inflection and declination, e.g.: Singular/Plural computer, computers Present/Past reject, rejected Many languages have also complex (de)composition, e.g.: Flachbildschirm (flat screen)> flach + Bildschirm > flach + Bild + Schirm
9
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Phrases, Terms, Named Entities Semantic Units Phrases (e.g. nominal - NP, prepositional - PP) NP a flat screen PP with a flat screen NP (recursive) the Dell computer with a flat screen a failure in the motherboard Terms (domain-specific phrases) Dell computer Dell computer with a flat screen Named Entities (phrases corresponding to dates, names, …) COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell
10
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Dependency Structure (I) Semantic Structure Dependencies between Predicates and Arguments the Dell computer with a flat screen had to be rejected PRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’ ‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & … Dependency Structure Analysis is based on: Sub-categorization Frames reject :: Subj:NP, Obj:NP Selection Restrictions reject :: Subj:NP:ANIMATE-ENTITY, Obj:NP:ENTITY
11
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Dependency Structure (II) The Dell computer that has been rejected was claimed to have suffered from handling. reject(e 1,x 1,y 1 ) & animate-entity(x 1 ) & Dell_computer(y 1 ) & claim(e 2,x 2,e 3 ) & animate-entity(x 2 ) & suffer_from(e 3,y 1,y 2 ) & handling (y 2 ) PRED claim SUBJ y 1 XCOMP PRED computer MOD Dell ADJUNCT PRED reject PRED suffer SUBJ y 1 OBL-from handling claim y1y1 Dell reject suffer y1y1 y1y1 handling SUBJ XCOMP MOD ADJUNCTOBL-from SUBJ y 1 : computer Lexical Functional Grammar (LFG)
12
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Ontology Learning from Text
13
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Some History Lexical Knowledge Extraction Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e.g. CRYSTAL (Soderland) Answer extraction in Question Answering, e.g. Webclopedia (Hovy) Thesaurus Extraction Similar work, (complex, multilingual) term extraction e.g. Sextant (Grefenstette); DR-Link (Liddy) Ontology Learning from Text Similar work, (domain-specific) term / relation extraction e.g. TextToOnto (Maedche & Staab), OntoLearn (Velardi et al.) Discussed here: OntoLT (Buitelaar, Olejnik & Sintek)
14
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 TextToOnto Association Rules
15
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLearn Domain-Specific WordNet Tuning and Extension
16
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Some Background Ontology Learning from Text Taxonomy Extraction, Document Clustering String-based, Document Level “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level TextToOnto OntoLearn Text in Ontology Engineering Textual Grounding of Concepts Retain Linguistic Contexts and Realizations Text-based Ontology Monitoring Compare Language Use over Time
17
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Some Background Ontology Learning from Text Taxonomy Extraction, Document Clustering String-based, Document Level “Unnamed” Relation Extraction, Word Clustering Stemming & Part-of-Speech, Token Level Extraction of Terms, “Named” Relations Pred-Arg & Head-Mod Structure, Term Level Text in Ontology Engineering Textual Grounding of Concepts Retain Linguistic Contexts and Realizations Text-based Ontology Monitoring Compare Language Use over Time OntoLT
18
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT What is it? OntoLT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain- specific ontology from a relevant text collection How does it work? 1. automatic linguistic annotation 2. automatic statistical preprocessing 3. interactive definition of mapping rules 4. interactive user validation of candidates 5. automatic integration into an ontology
19
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 OntoLT: Architecture
20
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 … … … … … mittler patellar Sehne Drittel … Linguistic Annotation … … mittlere Patellarsehnendrittel (mid patellar ligament third) An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einer neuen Knochenverblockungstechnik in einem zweistufigen Bohrkanal femoral fixiert.
21
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e.g. HeadNoun, Modifier, Subject, … Concat ConcatList combined through AND, OR, NOT, EQUAL Operators CreateCls create a new class with super-class AddSlot add a slot with range to a new or existing class CreateInst introduce an instance for a new or existing class FillSlot set the value of a slot of an instance
22
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Mapping Rules Precondition Language Var (Y, XPath (Y)) Get all occurrences of element Y, e.g. HeadNoun, Modifier, Subject, … Concat ConcatList combined through AND, OR, NOT, EQUAL Operators CreateCls create a new class with super-class AddSlot add a slot with range to a new or existing class CreateInst introduce an instance for a new or existing class FillSlot set the value of a slot of an instance
23
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Example Experiment Ontology Extraction for Neurology Neurology Section of a Medical Corpus Medical Scientific Journal Abstracts – MuchMore Project XML-based Linguistic Annotation PoS, Lemmatization, Phrases, Pred-Arg Structure Statistical Preprocessing (chi-square) Select Domain-Relevant Linguistic Entities Definition of Mapping Rules Define Operators for Selected Linguistic Entities Generate & Validate Class/Slot Candidates Select Candidates for Integration in Neurology Ontology Generate “Ontology Fragments” for Neurology
24
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
36
Further Issues Future Development Organization of Class/Slot Candidate List Inference & Clustering - “Graph Restructuring” Extend Statistical Preprocessing Multiple Reference Corpora Extended Frequency Information Include Machine Learning Approach Semi-Automatic Definition of Mapping Rules Performance Evaluation Guidelines ECAI04 Workshop on OLP Benchmark Challenge within PASCAL NoE
37
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Evaluation: What? -- Subtasks Classes (Multilingual) Term Extraction Named-Entity Recognition Similarity Thesaurus Term,Document Clustering Class-Hierarchy (Taxonomy) Thesaurus Extraction Term,Document Clustering Class-Properties (Relations) Relation Extraction ? Formal Properties of Relations (Properties) Class-Instances (Individuals) (Multilingual) Term Extraction Named-Entity Recognition Term,Document Classification
38
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Evaluation: How? By Sub-Task – Evaluation of: Classes – Term,NE Extraction,Clustering Class-Hierarchy – Thesaurus Extraction Class-Properties – Relation Extraction Class-Instances – Term,NE Extraction,Classification By Application – Evaluation of: Ontology Learning and Population – Gold Standard IR,QA – Precision /Recall Increase with Ontology? Interactive QA – Increased User Satisfaction? Information Access – Increased User Performance?
39
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Conclusions Stay Tuned OntoLT Release To be Announced on Protégé-Discussion List http://protege.stanford.edu/mailing-lists Evaluation Ontology Learning & Population (OLP) Challenge Within PASCAL NoE - First Task Spring 2005 ECAI04 Workshop: Evaluation of Text-based OLP http://olp.dfki.de/ECAI04/cfp.htm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.