Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

© 2014 Systems and Proposal Engineering Company. All Rights Reserved Using Natural Language Parsing (NLP) for Automated Requirements Quality Analysis Chris.
CLiNG - May Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
ISBN Chapter 3 Describing Syntax and Semantics.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
The quest for meaning in language documentation Felix Ameka.
Overview of Nursing Informatics
1/27 Semantics Going beyond syntax. 2/27 Semantics Relationship between surface form and meaning What is meaning? Lexical semantics Syntax and semantics.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Introduction to Lexical Semantics Vasileios Hatzivassiloglou University of Texas at Dallas.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Describing Syntax and Semantics
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Ontology-Driven Information Retrieval Nicola Guarino Laboratory for Applied Ontology Institute for Cognitive Sciences and Technology (ISTC-CNR) Trento-Roma,
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
What is linguistics  It is the science of language.  Linguistics is the systematic study of language.  The field of linguistics is concerned with the.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Flexible Text Mining using Interactive Information Extraction David Milward
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Conceptual Maps and Thesauri : A Comparison of Two Models of Representation Arising from Different Disciplinary Traditions Lalthoum Saàdani and Suzanne.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
Cognition & Language Claudia Stanny PSY What is Cognition? Processes of knowing  Attending  Remembering  Reasoning Content of these processes.
WordNet: Connecting words and concepts Peng.Huang.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
December 9, 2005Jurix conference Brussels 1 Using legal definitions to increase the accessibility of legal documents Laurens Mommers, Wim Voermans Department.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Wordnet - A lexical database for the English Language.
Computer Science CPSC 322 Lecture 22 Logical Consequences, Proof Procedures (Ch 5.2.2)
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Zdroje jazykových dat Word senses Sense tagged corpora.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
25 Questions and 5 Remarks on the “Medical Fact Net” Vision Udo Hahn.
Taylor 4 Prototype Categories II. Two main issues: What exactly are prototypes? Do ALL categories have a prototype structure?
Artificial Intelligence Knowledge Representation.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Knowledge Representation Techniques
NeurOn: Modeling Ontology for Neurosurgery
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
KNOWLEDGE REPRESENTATION
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Semantics Going beyond syntax.
Information Retrieval
Presentation transcript:

Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

The Challenge Bridge communication gap between lay persons and health care providers

Health Care Providers (HCP or “Experts”) --Physicians --Nurses --Therapists --on-line medical information systems

Non-Experts patients family members benefit administrators lawyers

Modes of communications Live Interaction with Patients Virtual Interaction --On-line medical information

Experts, lay persons speak different “dialects”

Characteristics of HCP language Ignorance/uncertainty as to non-experts’ lexical and conceptual knowledge Same word is used with different meanings by the two populations (word-concept mismatch) HCP use technical terms HCP substitute synonyms from different levels

Characteristics of non-expert language Idiosyncratic, “unregulated” --mix of technical and folk terms --taxonomies are less elaborate, shallower (fewer intermediate levels of categorial distinctions) --lay concepts are fuzzy (e.g., flu) --lay concepts have no (clear) equivalents in medicine (“Kreislaufprobleme”: “circulatory problems”)

Expert vs. non-expert language in dialogue interaction HCP introduce new concepts for which the lay person is unprepared --go from symptoms to diagnosis, treatments, etc. Lay questions are frequently “yes/no” Expert replies are usually not “yes/no” Often no opportunities for “repair”

Additional problem with on-line information systems Trivial linguistic features can have potentially significant consequences

Example: MEDLINEplus different results depending on query: tremor vs. intentional tremor tremble vs. trembling Linguistic (morphological) differences in the query result in semantically different answers

(our) solution Make the HCP “bilingual” Enable “translation” between consumer health information systems and laymen

Problems on three levels Lexical Conceptual Propositional (facts, beliefs, hypotheses,...)

Some ground rules for the next 45 mins Nothing hinges on “concept” Propose synset: {concept, universal, idea, type...} “Truth” applies only to propositions, not entities WordNet has “unicorn”, “Mickey Mouse”, etc.

A Linguist’s view Concepts/universals are expressed by lexemes (words) Words are embedded in contexts and partially derive their meanings from contexts Truth of propositions depends partially on their lexical make-up

Goals Document medical knowledge that can be understood by average adult health care consumer in the U.S. Make existing tools accessible for non- experts

Plan of Attack Create lexical database of medical terminology modeled on WordNet, with WN’s potential for NLP Lexical (word) information is complemented with definitional sentences, one for experts, one for laymen Sentences provide meaningful contexts for terms 2 Sentential subcorpora: Facts and Beliefs

Some background: WordNet Large lexical database for English Semantic network? yes Thesaurus? yes BUT unlike in Roget’s, WN’s relations are labeled Ontology? who knows?

WordNet Constructed entirely by hand Semantic network of 115,000 synonyms sets (“synsets”) Example synsets: {chest, thorax, torso,# the part of the body below the neck and above the belly; “the victim had a knife stuck in his chest”)}

WordNet synsets One or more “cognitively synonymous” lexemes Definition (“gloss”) Examples sentence Meronymy, hyponymy relate noun synsets result: semantic network

WordNet synsets Where did the makers of WN get their synonyms, meronyms, etc. from? Mid-1980s: no corpora were available Association norms Some psycholinguistic testing (sorting experiments) Assumption: speakers’ use of words reflects conceptual organization

WordNet WordNet’s value for computational linguistics, Natural Language Processing Synonyms, related synsets allow searches for semantically related nodes --E.g., query expansion Information retrieval, Q-A systems, data mining,... Inferencing

Two problems: Synonymy and Polysemy WordNet maps lexemes (words) and concepts (meanings) Words are labels for concepts that speakers find salient --Identification of the same concept labelled with different words (synonymy); e.g. chest, thorax --Disambiguation of polysemous words weak patient vs. weak solution

Synonymy and Polysemy Synonymy: membership in the same synset Polysemy: number of synsets of which a given string is a member

WordNet In addition, related words and concepts can be found via the relations among entire synsets Hyponymy/hyperonymy (super-/subordination) HIV is a kind of virus One kind of virus is HIV Meronymy/holonymy (part-whole) occipital bone is part of cranium cranium has an occipital bone

WordNet Different kinds of hyponymy Types vs. Instances Kingdom is a type of country Monaco is an instance of a kingdom

Lexical semantics in WordNet The meaning of a word results from its place in the semantic network

WordNet for medical/bioinformatics? Synonymy, polysemy are problems here, too is WN’s way of mapping words and meanings useful?

WordNet for medical/bioinformatics? WN’s was compiled by non-experts Medical coverage is sparse and arbitrary

WordNet’s medical coverage contains both expert and folk terms (indistinguishable) contains archaic terms like unction no type vs. role (symptom) distinction e.g., tumors are abnormal but not: some tumors are malignant No links among entities, properties, processes, states domain labels (medicine, drugs,..) are assigned incompletely and inconsistently (no good domain ontology)

Create lexical database of medical terminology modelled on WordNet (MedWN) Info in MedWN can be accessed automatically Retain WN’s features to make it usable for NLP

Steps to take Review, validate, augment WN’s present medical coverage Ensure sufficiently high scientific level so that MedWN can work in tandem with existing terminology banks, ontologies,...

Create subcorpora of sentences MedicalFactNet --sentences rated as correct by medical experts --sentences express “true” beliefs about medical phenomena --intelligible to non-experts

Subcorpora of sentences MedicalBeliefNet --sentences rated highly for assent by lay persons --representative fraction of true and false beliefs about medical phenomena

Constraints on subcorpora Complete, grammatical English sentences No anaphora (it, then, this): context-free generic sentences Statements embed terms in typical, informative contexts

Sources for subcorpora --sentences generated via WordNet’s relations --WordNet’s definitions of medical terms --sentences from online medical services

Sentences from on-line information sources --fact sheets --NIAID Health Information Publications --UK NetDoctor’s Diseases Encyclopedia

Example NetDoctor text: Hay fever, otherwise known as seasonal allergenic rhinitis, is an allergic reaction to airborne substances such as pollen.... Created sentences: Hay fever is an allergy. Hay fever is an allergic reaction Hay fever is a reaction to pollen...

Second source of sentences Derive propositions from WordNet: Express labeled arcs as proposition e.g. if x is a hyponym of y  “x is a type of y”  meronymy: “x is a part of y “

Validation Derived sentences are judged by humans Likert Scale 1-5 Participants assign a score for U (understanding) to all sentences Sentences judged to be understandable are scored further for B (belief) by lay persons C (correctness) by experts

Validation Statements receiving a B-score of 4 or higher => MedicalBeliefNet Statements receiving a C-score of 4 or higher => MedicalFactNet

Side effects (beneficial) of corpus Basis for new NLP applications in the medical domain Basis for exploring individual and group differences wrt medical knowledge, vocabulary, reasoning, decision-making Use in medical training

Future work Scale up coverage Add relations among events (states, activites) as expressed by verbs Current work: explore “function/purpose” relation among verbs (analogous to roles among entities expressed by nouns) e.g., to run is to exercise (defeasible) to run is to move (not defeasible)

Future work Add relations and modalities (causality, conditionals,..) --these are more or less explicit in WordNet Crosslingual MedWN? Bootstrap from existing multilingual wordnets?