Download presentation
Presentation is loading. Please wait.
Published byGeorgina Anthony Modified over 9 years ago
1
Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences
2
The Challenge Bridge communication gap between lay persons and health care providers
3
Health Care Providers (HCP or “Experts”) --Physicians --Nurses --Therapists --on-line medical information systems
4
Non-Experts patients family members benefit administrators lawyers
5
Modes of communications Live Interaction with Patients Virtual Interaction --On-line medical information
6
Experts, lay persons speak different “dialects”
7
Characteristics of HCP language Ignorance/uncertainty as to non-experts’ lexical and conceptual knowledge Same word is used with different meanings by the two populations (word-concept mismatch) HCP use technical terms HCP substitute synonyms from different levels
8
Characteristics of non-expert language Idiosyncratic, “unregulated” --mix of technical and folk terms --taxonomies are less elaborate, shallower (fewer intermediate levels of categorial distinctions) --lay concepts are fuzzy (e.g., flu) --lay concepts have no (clear) equivalents in medicine (“Kreislaufprobleme”: “circulatory problems”)
9
Expert vs. non-expert language in dialogue interaction HCP introduce new concepts for which the lay person is unprepared --go from symptoms to diagnosis, treatments, etc. Lay questions are frequently “yes/no” Expert replies are usually not “yes/no” Often no opportunities for “repair”
10
Additional problem with on-line information systems Trivial linguistic features can have potentially significant consequences
11
Example: MEDLINEplus different results depending on query: tremor vs. intentional tremor tremble vs. trembling Linguistic (morphological) differences in the query result in semantically different answers
12
(our) solution Make the HCP “bilingual” Enable “translation” between consumer health information systems and laymen
13
Problems on three levels Lexical Conceptual Propositional (facts, beliefs, hypotheses,...)
14
Some ground rules for the next 45 mins Nothing hinges on “concept” Propose synset: {concept, universal, idea, type...} “Truth” applies only to propositions, not entities WordNet has “unicorn”, “Mickey Mouse”, etc.
15
A Linguist’s view Concepts/universals are expressed by lexemes (words) Words are embedded in contexts and partially derive their meanings from contexts Truth of propositions depends partially on their lexical make-up
16
Goals Document medical knowledge that can be understood by average adult health care consumer in the U.S. Make existing tools accessible for non- experts
17
Plan of Attack Create lexical database of medical terminology modeled on WordNet, with WN’s potential for NLP Lexical (word) information is complemented with definitional sentences, one for experts, one for laymen Sentences provide meaningful contexts for terms 2 Sentential subcorpora: Facts and Beliefs
18
Some background: WordNet Large lexical database for English Semantic network? yes Thesaurus? yes BUT unlike in Roget’s, WN’s relations are labeled Ontology? who knows?
19
WordNet Constructed entirely by hand Semantic network of 115,000 synonyms sets (“synsets”) Example synsets: {chest, thorax, torso,# body_part,@ the part of the body below the neck and above the belly; “the victim had a knife stuck in his chest”)}
20
WordNet synsets One or more “cognitively synonymous” lexemes Definition (“gloss”) Examples sentence Meronymy, hyponymy relate noun synsets result: semantic network
21
WordNet synsets Where did the makers of WN get their synonyms, meronyms, etc. from? Mid-1980s: no corpora were available Association norms Some psycholinguistic testing (sorting experiments) Assumption: speakers’ use of words reflects conceptual organization
22
WordNet WordNet’s value for computational linguistics, Natural Language Processing Synonyms, related synsets allow searches for semantically related nodes --E.g., query expansion Information retrieval, Q-A systems, data mining,... Inferencing
23
Two problems: Synonymy and Polysemy WordNet maps lexemes (words) and concepts (meanings) Words are labels for concepts that speakers find salient --Identification of the same concept labelled with different words (synonymy); e.g. chest, thorax --Disambiguation of polysemous words weak patient vs. weak solution
24
Synonymy and Polysemy Synonymy: membership in the same synset Polysemy: number of synsets of which a given string is a member
25
WordNet In addition, related words and concepts can be found via the relations among entire synsets Hyponymy/hyperonymy (super-/subordination) HIV is a kind of virus One kind of virus is HIV Meronymy/holonymy (part-whole) occipital bone is part of cranium cranium has an occipital bone
26
WordNet Different kinds of hyponymy Types vs. Instances Kingdom is a type of country Monaco is an instance of a kingdom
27
Lexical semantics in WordNet The meaning of a word results from its place in the semantic network
28
WordNet for medical/bioinformatics? Synonymy, polysemy are problems here, too is WN’s way of mapping words and meanings useful?
29
WordNet for medical/bioinformatics? WN’s was compiled by non-experts Medical coverage is sparse and arbitrary
30
WordNet’s medical coverage contains both expert and folk terms (indistinguishable) contains archaic terms like unction no type vs. role (symptom) distinction e.g., tumors are abnormal but not: some tumors are malignant No links among entities, properties, processes, states domain labels (medicine, drugs,..) are assigned incompletely and inconsistently (no good domain ontology)
31
Create lexical database of medical terminology modelled on WordNet (MedWN) Info in MedWN can be accessed automatically Retain WN’s features to make it usable for NLP
32
Steps to take Review, validate, augment WN’s present medical coverage Ensure sufficiently high scientific level so that MedWN can work in tandem with existing terminology banks, ontologies,...
33
Create subcorpora of sentences MedicalFactNet --sentences rated as correct by medical experts --sentences express “true” beliefs about medical phenomena --intelligible to non-experts
34
Subcorpora of sentences MedicalBeliefNet --sentences rated highly for assent by lay persons --representative fraction of true and false beliefs about medical phenomena
35
Constraints on subcorpora Complete, grammatical English sentences No anaphora (it, then, this): context-free generic sentences Statements embed terms in typical, informative contexts
36
Sources for subcorpora --sentences generated via WordNet’s relations --WordNet’s definitions of medical terms --sentences from online medical services
37
Sentences from on-line information sources --fact sheets --NIAID Health Information Publications --UK NetDoctor’s Diseases Encyclopedia
38
Example NetDoctor text: Hay fever, otherwise known as seasonal allergenic rhinitis, is an allergic reaction to airborne substances such as pollen.... Created sentences: Hay fever is an allergy. Hay fever is an allergic reaction Hay fever is a reaction to pollen...
39
Second source of sentences Derive propositions from WordNet: Express labeled arcs as proposition e.g. if x is a hyponym of y “x is a type of y” meronymy: “x is a part of y “
40
Validation Derived sentences are judged by humans Likert Scale 1-5 Participants assign a score for U (understanding) to all sentences Sentences judged to be understandable are scored further for B (belief) by lay persons C (correctness) by experts
41
Validation Statements receiving a B-score of 4 or higher => MedicalBeliefNet Statements receiving a C-score of 4 or higher => MedicalFactNet
42
Side effects (beneficial) of corpus Basis for new NLP applications in the medical domain Basis for exploring individual and group differences wrt medical knowledge, vocabulary, reasoning, decision-making Use in medical training
43
Future work Scale up coverage Add relations among events (states, activites) as expressed by verbs Current work: explore “function/purpose” relation among verbs (analogous to roles among entities expressed by nouns) e.g., to run is to exercise (defeasible) to run is to move (not defeasible)
44
Future work Add relations and modalities (causality, conditionals,..) --these are more or less explicit in WordNet Crosslingual MedWN? Bootstrap from existing multilingual wordnets?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.