1 Building a Multilingual Lexicon by Robert Baud SemanticMining WP20 Freiburg, 29 Mars 2004.

Slides:



Advertisements
Similar presentations
Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Advertisements

CLiNG - May Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty.
Ontologies - Design principles Cartic Ramakrishnan LSDIS Lab University of Georgia.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Wrap up  Matching  Geometry  Semantics  Multiscale modelling / incremental update / generalization  Geometric algorithms  Web Services.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Chapter 6: Design of Expert Systems
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Programming Languages An Introduction to Grammars Oct 18th 2002.
Foundations This chapter lays down the fundamental ideas and choices on which our approach is based. First, it identifies the needs of architects in the.
Language: Form, Meanings and Functions
Business Domain Modelling Principles Theory and Practice HYPERCUBE Ltd 7 CURTAIN RD, LONDON EC2A 3LT Mike Bennett, Hypercube Ltd.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
The Software Development Life Cycle: An Overview
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Provo, 16 Aug 2007 LMF meeting 1 Lexical Markup Framework: ISO Provo meeting Gil Francopoulo.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Logics for Data and Knowledge Representation
Artificial Intelligence
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.
Adaptive Hypermedia Tutorial System Based on AHA Jing Zhai Dublin City University.
American Medical Informatics Association Annual Symposium 2001 The Role of Definitions in Biomedical Concept Representation Joshua Michael, José L. V.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Dr. Francisco Perlas Dumanig
Do Not be Mad, We have a Power Point. The Language of Art People throughout the world speak many different languages. Spanish, Swahili, Japanese, Hindi,
Chapter 3 Culture and Language. Chapter Outline  Humanity and Language  Five Properties of Language  How Language Works  Language and Culture  Social.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Design? !… When it needs? To understand, to communicate with customers Complex problem What is good design? Separate What to do?(Policy) and How to do(mechanism)
Week III  Recap from Last Week Review Classes Review Domain Model for EU-Bid & EU-Lease Aggregation Example (Reservation) Attribute Properties.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
For Friday Finish chapter 24 No written homework.
Working with Ontologies Introduction to DOGMA and related research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Ontologies for Terminologies, Knowledge Representation & Software: Benefits & Gaps (“Don’t make the tea”) (Only a part of Knowledge Representation) Alan.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Some Thoughts to Consider 5 Take a look at some of the sophisticated toys being offered in stores, in catalogs, or in Sunday newspaper ads. Which ones.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Ontology Technology applied to Catalogues Paul Kopp.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
UNIT-IV Designing Classes – Access Layer ‐ Object Storage ‐ Object Interoperability.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
NeurOn: Modeling Ontology for Neurosurgery
Object-Oriented Analysis and Design
Medical Natural Language Understanding now and tomorrow
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Web Ontology Language for Service (OWL-S)
Knowledge Representation
Lexical ambiguity in SNOMED CT
CSc4730/6730 Scientific Visualization
KNOWLEDGE REPRESENTATION
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Habib Ullah qamar Mscs(se)
Presentation transcript:

1 Building a Multilingual Lexicon by Robert Baud SemanticMining WP20 Freiburg, 29 Mars 2004

2 Natural Language Processing Group Building a multilingual lexicon  Starting from a model of medicine or starting from a pragmatic observation of the languages ?  What representation of knowledge is to be added to a lexicon ? The question is what makes a lexicon multilingual  From signals to understanding or the different levels of granularity of the language information  Defining the Lexicon Ontology (LO) in order to start on a sound basis.

3 Natural Language Processing Group Modeling or not?  In the last decade, the idea of model of medicine was prevalent, like Snomed, Galen, UMLS, etc.  NLP was necessary as a way to help communicate the content of the model.  The principle of guidance by the model was admitted.  But a general models of medicine is far from being reallity, and this will remain true for certainly a few decades  Therefore, it is not a good idea to base the NLP on the existence of a model  Make the NLP free from any model !

4 Natural Language Processing Group has_parent has_child linked_to arm finger handfoot palm Modeling the medical domain surgery eventprocess top path.normal object traumadisease Light model

5 Natural Language Processing Group Local model  Words are at different levels of detail:  burn of the finger and burn of the thumb  digestive disorder and post-prandial disorder  vertebra and atlas  Attributes or properties are generalized to classes of concepts  Local inferences between close levels in a hierarchy of concepts is necessary before chunking information.

6 Natural Language Processing Group Semantic lexicons  A semantic lexicon is a lexicon with attachments to existing terminologies and ontologies  But, what do we attach to what and how:  Grouping of words representing the same object ?  What is the semantic of this association?  What about multilingual aspects ?  Problem of coherence of multiple attachments

7 Natural Language Processing Group « paupière » « eyelid » « Augenlid » _eyelid « bléphar » « blephar » « blépharo » « blepharo » « palpébral » « palpebral » « ? » _blephar _blepharo _palpebral cl_Eyelid lexical representationontological representation GalenUMLS Semantic net MEsH Snomed ICD10 other lemme levelAbstract Lexical Identifierontological levelUniversal Object Identifier From words to objects

8 Natural Language Processing Group « corps » « body » « Körper » « corps » « body » « Körper » « corps » « body » « Körper » « corps étranger » étranger » « foreign body » « Fremdkörper » cl_Body MEsH Semantic net etc. cl_Trunck cl_DeadBody cl_ForeignBody Dealing with proximity of words lexical representationontological representation _BodyAsWhole _BodyAsTrunck _BodyAsDeadPerson _BodyAsForeign lemme levelAbstract Lexical Identifierontological levelUniversal Object Identifier

9 Natural Language Processing Group From signals to understanding utterances lexicon entries language words abstract lexical identifier universal object identifier object link between objects

10 Natural Language Processing Group Utterances  A speech, a sentence, a sign, a signal, generally issued by a human being  An expression of something to be communicated  Well-formed or ill-formed  Difficulty to delimit what is a unit of communication or a kind of atomic message  Utterances are expected to be converted to written sentences for subsequent processing.

11 Natural Language Processing Group Lexicon entries  All 3 kinds of lexicon entries are pointing to well defined objects of the world  Single word entries, without blank character, not decomposable  Word components or morphosemantems are parts of decomposition of compound words  Expressions or short terms, made of 2 to 5 words, representing single objects, like idiomatic expressions and language idiosyncracies, which cannot be represented by ordinary composition of their parts.

12 Natural Language Processing Group Language words  In most natural languages, words present morphological variations, which have to be resolved  Rule-based systems are able to solve this problem  From a sentence, a lemmatizer is a program producing the list of the lemmes of all word – in their basic form - generally singular, masculine, nominative and infinitive, whatever applies.  A multilingual lexicon should include the definitions of the rules and should flag the regular words

13 Natural Language Processing Group Abstract lexical identifier (LID)  The same word generally exists in different languages  The same word may have different lemmes in a given language  The information about these facts has to be explicitely collected  The recipient of the collection of all forms is call an abstract lexical identifier  It is represented by a unique set of characters. based on the English lemme, with extension when necessary.

14 Natural Language Processing Group Universal object identifier (CID)  Physical objects and abstract objects are parts of the world  A unique object identifier has to be defined for the representation of each object of the domain under scrutiny  One and only one link has to be defined between an abstract lexical identifier and a object identifier  Multiple links may converge to the same object identifier.

15 Natural Language Processing Group Abdomen and its contex

16 Natural Language Processing Group Hypertension and its context

17 Natural Language Processing Group Insect and its context

18 Natural Language Processing Group Abandonment and its context

19 Natural Language Processing Group Abscess and its context

20 Natural Language Processing Group Fœtus and its context

21 Natural Language Processing Group Actual implementation

22 Natural Language Processing Group The Lexicon Ontology (LO)  To answer to the need of a formal definition of all objects implied in the building of a multilingual lexicon  Based on sound recommendations regarding modern ontologies  Insure proper communication of design between the actors of the implementation and the users  Frame-based implementation using Protégé  May be used for a knowledge driven implementation of the lexicon.

23 Natural Language Processing Group LO Implementation

24 Natural Language Processing Group PermanentObject

25 Natural Language Processing Group Dependant Objects

26 Natural Language Processing Group FullWord

27 Natural Language Processing Group PartWord

28 Natural Language Processing Group Definition by genus and differentia  Definitions are composed automatically by the schema of inheritance through the isa links  A Noun is a LexiconObject which:  represents a physical or abstract object or any of their attributes,  is a building bloc of a sentence,  is used stand alone in a text,  is an undecomposable atom,  is an object embodied in the construction of a multilingual lexicon of the medical domain,  is necessary for processing of writen medical text.

29 Natural Language Processing Group Available resources  Multilingual lexicon:  French: >  English: >  German: >  Latin: > 6500 (+ 9000)  Proper names: > 3000  Tools (achievement may be dependant on the language)  Word decomposition  Tokenizer  Error correction  Several utilities: Semantic Net, Mesh, TA, etc.  Web server for lexicon access

30 Natural Language Processing Group Recommendations  Define the lexicon on a strong formal basis  Make explicit the multilingual aspects  Take care of flectional morphology  Favour the proper treatment of compound words  Be open to the evolution of languages and the venue of other European languages  Make available links to well known terminologies and ontologies

31 Thank you for your attention