Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology.

Similar presentations


Presentation on theme: "© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology."— Presentation transcript:

1 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany

2 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Overview Ontologies and the Semantic Web  Semantic Web Intro  Ontologies and Knowledge Markup  Ontology Development  Ontology Lifecycle & Language Technology Language Technology  Levels of Automatic Linguistic Analysis Ontologies in Multilingual Information Access  A Medical Example: MuchMore Project  Semantic Resources in the Medical Domain  Demo MuchMore System  Language Technology in Annotation and Indexing Conclusions  MuchMore for the Legal Domain…

3 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Semantic Web Intelligent Man-Machine Interface Knowledge Markup Ontologies Semantic Web Services

4 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontology-based Knowledge Markup Semantic Metadata  Metadata, e.g. Dublin Core -- Title, Author, etc.  Semantic:Formal Properties of Objects of Class Author John Smith Knowledge Markup

5 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Semantic Web Architecture Layered Architecture (Tim Berners-Lee)

6 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Knowledge Markup Languages XML SchemaNamespaces Interpretation Context RDF Schema OWL (DAML+OIL) Formalization: Classes (Inheritance), Properties Formalization: Classes, Class Definitions, Properties, Property Types (e.g. Transitivity) Data Types XML RDF SyntaxSemantics

7 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies: Basic Idea Definition  “… Explicit, Formal Specification of a Shared Conceptualization of a Domain of Interest ” T. Gruber Towards principles for the design of ontologies used for knowledge sharing. Int. J. of Human and Computer Studies, 1994 Purpose  Knowledge Sharing (e.g. between Agents)  Inference (over Sets of Instances) Related Areas, e.g.  Terminologies, Controlled Vocabulary, Thesauri, Taxonomies, Semantic Lexicons, Wordnets, etc.  Conceptual Models, Schemas, etc.

8 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies: Applications, e.g. Semantic Web Services  Interoperability for (Semantic) Web Services Intelligent Agents  Domain Models for Intelligent Agents Text Interpretation  Ontology-aware Information Extraction Multimedia Integration  Ontology-based Alignment of Extracted Objects in Text, Audio, Video Intelligent Search/Navigation  Ontology-based Indexing in Web-Retrieval

9 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies: Development Ontology Editor / KB Management  Most Widely Used: Protégé (Stanford University, Medical Informatics, USA)  Originally for Development and Maintenance of Medical Expert Systems  Other, e.g.  KAON : University of Karlsruhe - AIFB, Germany  WebOde : UPM – Ontology Group, Madrid, Spain  WebOnto : Open University - KMI, UK  Overview at XML.com by Michael Denny: Ontology Building: A Survey of Editing Tools

10 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Class Hierarchy Slot Descriptions http://dmag.upf.es/ontologies/2003/12/ipronto.owl

11 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontology Lifecycle Creating Populating Validating Evolving Maintaining Deploying

12 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 LT in the Ontology Lifecycle Ontology (Knowledge) Creating & Evolving Linguistic Analysis to Extract Classes / Relations Populating (Knowledge Base Generation) Linguistic Analysis to Extract Instances Documents (Text) Language Technology (LT) for Ontology: Language Technology = Automated Linguistic Analysis Classes, Relations/Properties

13 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. Dell computer flat screen motherboard has-a reject failure location-of animate-entity

14 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Part-of-Speech, Morphology Part-of-Speech  e.g.: noun, verb, adjective, preposition, …  PoS tag sets may have between 10 and 50 (or more) tags Morphology  Most languages have inflection and declination, e.g.: Singular/Plural computer, computers Present/Past reject, rejected  Many languages have also complex (de)composition, e.g.: Flachbildschirm (flat screen)> flach + Bildschirm > flach + Bild + Schirm

15 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Phrases, Terms, Named Entities Semantic Units  Phrases (e.g. nominal - NP, prepositional - PP) NP a flat screen PP with a flat screen NP (recursive) the Dell computer with a flat screen a failure in the motherboard  Terms (domain-specific phrases) Dell computer Dell computer with a flat screen  Named Entities (phrases corresponding to dates, names, …) COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell

16 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Dependency Structure Semantic Structure Dependencies between Predicates and Arguments the Dell computer with a flat screen had to be rejected PRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’ ‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & … The Dell computer that has been rejected was claimed to have suffered from handling. reject(e 1,x 1,y 1 ) & animate-entity(x 1 ) & Dell_computer(y 1 ) & claim(e 2,x 2,e 3 ) & animate-entity(x 2 ) & suffer_from(e 3,y 1,y 2 ) & handling (y 2 )

17 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 MuchMore Project Demonstration Prototype  Real-Life Medical Scenario for Cross-Lingual Information Retrieval Research & Development  Combined Data- and Knowledge-Driven Performance Evaluation  Performance Comparison of Existing and Novel Methods http://muchmore.dfki.de

18 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 General WordNet (EN), GermaNet (DE), EuroWordNet (“linked”) Medical Domain UMLS: Unified Medical Language System Medical MetaThesaurus (only MeSH2001 is used) English, German, Spanish, … 730.000 Concepts 9 Relations (Broader, Narrower,…) Semantic Network 134 Semantic Types 54 Semantic Relations Semantic Resources

19 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 C0019682|ENG|P|L0019682|PF|S0048631|HIV|0| C0019682|ENG|S|L0020103|PF|S0049688|HTLV-III|0| C0019682|ENG|S|L0020128|VS|S0049756|Human Immunodeficiency Virus|0| C0019682|ENG|S|L0020128|VWS|S0098727|Virus, Human Immunodeficiency|0| C0019682|FRE|P|L0168651|PF|S0233132|HIV|3| C0019682|FRE|S|L0206547|PF|S0277133|VIRUS IMMUNODEFICIENCE HUMAINE|3| C0019682|GER|P|L0413854|PF|S0538136|HIV|3| C0019682|GER|S|L1261793|PF|S1503739|Humanes T-Zell-lymphotropes Virus Typ III|3| other languagesGERMAN 66,381ENGLISH 1.462,202 Concept Names: 1.734,706 Each CUI (Concept Unique Identifier) is mapped to one out of 134 Semantic Types or TUI (Type Unique Identifier) Clozapine: C0009079  Pharmacologic Substance: T121 MetaThesaurus, SemNet Semantic Types are organized in a Network through 54 Relations T121|T154|T047

20 © Paul Buitelaar: eJustice Presentation, July 15th, 2004

21

22

23

24

25

26

27

28

29

30

31 Token (with Part-of-Speech) German: Kreuzbandes English: ligaments Lemma (or Sequence of Lemmas - Decomposition) German: Faserknorpel  Faser + Knorpel English: ligament UMLS Concept Code and Semantic Type ligament : C0022745_T030 MeSH Code A2.513 Semantic Relation (over a Pair of UMLS Concepts) C0022745_T030 interconnects C0047693_T065 Annotation & Indexing

32 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 UMLS Semantic Network specifies 54 types of relations between 134 semantic types Pharmacologic Substance affects Cell Function Relations are generic and potentially false Therapeutic Procedure method_of Occupation,Discipline *discectomy method_of history Relations are ambiguous Therapeutic Procedure prevents Neoplastic Process Therapeutic Procedure complicates Neoplastic Process Therapeutic Procedure affects Neoplastic Process Therapeutic Procedure treats Neoplastic Process Relations

33 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs. Example

34 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Terms:C0019134 Heparin C0005790Blood coagulation tests C0013227Pharmaceutical preparations Example: Terms/Concepts Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.

35 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Relations:C0019134 interacts_with C0013227 C0005790 analyses C0019134 C0005790 analyses C0013227 Example: Relations Terms:C0019134 Heparin C0005790Blood coagulation tests C0013227Pharmaceutical preparations Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.

36 © Paul Buitelaar: eJustice Presentation, July 15th, 2004 Conclusions MuchMore for the Legal Domain…  Resources Legal Domain Ontology with… …Large-scale Terminology for Multiple Languages, or if not available… …Large Legal Domain Corpora in Multiple Languages for Term Extraction… …and for Relation Extraction if Ontology Needs to be Constructed/Adapted  Tools Linguistic Analysis (PoS, Morphology, Term Grammars, etc.)… …for Multiple Languages… …Tuned to the Legal Domain… Information Retrieval Infrastructure, Interface Design, etc.


Download ppt "© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology."

Similar presentations


Ads by Google