Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.

Similar presentations


Presentation on theme: "Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics."— Presentation transcript:

1 Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics

2 Background 1994: B.A. in Linguistics & Chinese –University of Utah 2000: Ph.D. in Medical Informatics –University of Utah –Peter Haug 2003: Postdoctoral Fellowship –University of Pittsburgh –Bruce Buchanan –Greg Cooper 2003-present: Faculty –University of Pittsburgh

3 Problems Being Addressed with IE My work Identifying patients with pneumonia from chest radiograph reports Understanding the components of a clearly written radiology report –Train radiologist to dictate Classifying patients into syndrome categories from chief complaints –Cough/SOB  Respiratory patient Characterizing patient’s clinical state from ED reports –Outbreak detection –Outbreak investigation NLP-assisted ontology learning Locating pathology specimens

4 Problems Being Addressed with IE Future areas of application I would like to work on Learning genotype-phenotype patterns for diseases Quality control –Ensure physicians are complying with core measures required by Medicare –Look for medical errors Automatically assigning billing codes

5 Where is the Field Now? Field mainly focused on sentence-level problems –Identifying clinical conditions, therapies, medications A few systems for encoding characterizing information for condition

6 Less work on discourse level tasks—these are crucial for successful annotation of clinical texts –Contextual features Negation Uncertainty Experiencer Temporality Finding validation –Coreference resolution –Inference

7 What Technologies Work? IE of clinical concepts (80% “simple”, 20% difficult) Shallow parsing quite effective –MetaMap can identify many of the UMLS concepts in texts Concept-value pairs important—Regular expressions quite effective –“temperaure 39C” Structure of report important –Neck: no lymphadenopathy  cervical lymphadenopathy –CXR: evidence of pneumonia  radiological evidence of pneumonia

8 Where Do We Need More Work? Non-contiguous information –need deep parse Inference –“pain when press on left side of sternum”  non-pleuritic chest pain semantic networks –Opacity consistent with pneumonia  localized infiltrate Bayesian networks

9 What Technologies Work? Contextual Features (80% “simple”, 20% difficult) Rules based on trigger terms work quite well –NegEx –ConText

10 Negation Is the condition negated? Negated Affirmed Patient Experience Did the patient experience the condition? Yes No Temporality When did the condition occur? Historical Recent Hypothetical Three Contextual Features

11 ConText Algorithm Four elements –Trigger terms –Pseudo-trigger terms –Scope of the trigger term –Termination terms Assign appropriate value to contextual features for clinical conditions within scope of trigger terms Scope is usually until end of sentence or until trigger term

12 ConText: Determine Values for Contextual Features Based on negation algorithm NegEx Patient denies cough but complains of headache. No change in the patient’s chest pain. trigger term termination term pseudo-trigger term scope Clinical condition:Cough Negation:Negated

13 Evaluation of ConText Test set –90 ED reports Reference standard Physician annotations with NLP-assisted review –55 conditions –3 contextual features Outcome measures –Recall –Precision

14 ConText’s Performance 1,620 annotations FeatureRecallPrecision Negation(773)97%97% Historical(98)67%74% Hypothetical (40)83%94% Experiencer (8)100%100%

15 What is Needed for the 20% More knowledge modeling –Historicity often depends on the condition not on explicit time triggers –Coreference resolution needs fine-grained semantic knowledge Statistical techniques –Integrating information regarding sentence- level and discourse level information Annotated data sets

16 Why Haven’t We Implemented Many NLP Applications? Are we addressing the best application areas? Do we need more semi-automated applications?

17 Sharing Clinical Reports University of Pittsburgh IRB –Chief complaints are non human subjects data Can share openly as long as can’t triangulate patient –Chief complaint, age, hospital  patient –To use clinical report, must apply De-ID software Once apply De-ID, considered deidentified caBIG project, can share de-identified reports I hope to establish repository

18 National Sharing Maybe as some institutions begin sharing, others will follow? Can the NLM help? –Apply de-identification –Encrypted hospital information –Password protected –Repository of texts and annotations –Folk annotations?

19 Annotation Sets of Ours Chief Complaints (40,000) Syndrome classifications ED Reports Syndrome classification 55 respiratory-related clinical conditions –Negation –Experiencer –Historical –Hypothetical 6 report types All clinical conditions –Contextual features

20 Annotation Evaluation Measuring annotators’ –Reliability –Agreement More difficult if measuring agreement on what text was marked –F-measure Measuring quality of annotation schema –Dependent variable = agreement between annotators

21

22 AB C AB C 0.17 0.24 0.23 0.12 0.130.10 Baseline Schema StageAnnotation Schema Stage

23 Photos courtesy Brian Chapman http://web.mac.com/desertlight/iWeb/Reflections,%20Rotations,%20Symmetries/Welcome.html


Download ppt "Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics."

Similar presentations


Ads by Google