Information Extraction From Medical Records by Alexander Barsky.

Slides:



Advertisements
Similar presentations
BAH DAML Tools XML To DAML Query Relevance Assessor DAML XSLT Adapter.
Advertisements

1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
1/(20) Introduction to ANNIE Diana Maynard University of Sheffield March 2004
An Introduction to GATE
University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.
University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.
University of Sheffield NLP Module 4: Machine Learning.
University of Sheffield NLP Module 11: Advanced Machine Learning.
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
The Semantic Web. The Web Today Designed for Human to read Cannot express meaning Architecture: URL –Decentralized: Link structure Language: html.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
IEC Substation Configuration Language and Its Impact on the Engineering of Distribution Substation Systems Notes Dr. Alexander Apostolov.
Opinion Mapping Travelblogs Efthymios Drymonas Alexandros Efentakis Dieter Pfoser Research Center Athena Institute for the Management of Information Systems.
CS4025: Advanced Information Extraction. Overview CS4025, Department of Computing Science, University of Aberdeen 2 Overview of aspects of IE and General.
Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.
Alex Meng Chunshi Jin Elliott Conant Jonathan Fung.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
17 Apr 2002 XML Stylesheets Andy Clark. What Is It? Extensible Stylesheet Language (XSL) Language for document transformation – Transformation (XSLT)
Collections Management Museums Reporting in KE EMu.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Reporting in EMu Crystal != Reporting or Why is reporting so difficult and can we do anything about it? Bernard Marshall KE Software.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Taken partially from.
Introduction 01_intro.ppt
Named Entity Recognition without Training Data on a Language you don’t speak Diana Maynard Valentin Tablan Hamish Cunningham NLP group, University of Sheffield,
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Towards Constructing a Chinese Information Extraction System to Support Innovations in Library Services World Library and Information Congress: 72nd IFLA.
Experiences with UIMA from a User’s Perspective Dietmar Rösner, Manuela Kunze, Hany Mahgoub University of Magdeburg C Knowledge Based Systems and Document.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Session IV Chapter 9 – XML Schemas
WordFreak A Language Independent, Extensible Annotation Tool.
Open Health Natural Language Processing Consortium (OHNLP)
MinorThird 서울시립대학교 인공지능연구실 곽별샘
Open Information Extraction using Wikipedia
Populating an XML instance document with data from Excel 1.Create an instance document skeleton containing at least 2 elements (with attribute tags) 2.Import.
27/03/01CROSSMARC kick-off meeting LTG Background XML-based Processing –Several years of experience in developing XML-based software –LT XML Tools –Pipeline.
Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk.
Bringing “it” all Together !? Dean Djokic, ESRI David Maidment.
Intro to XML Dr. Lam TECM5191. Why XML? Text CHRISLAM138 to
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
Accessing Data Using XML CHAPTER NINE Matakuliah: T0063 – Pemrograman Visual Tahun: 2009.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Sheffield -- Victims of Mad Cow Disease???? Or is it really possible to develop a named entity recognition system in 4 days on a surprise language with.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
MedKAT Medical Knowledge Analysis Tool December 2009.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ]  Class Classifies annotations.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
©2012 Paula Matuszek GATE and ANNIE Information taken primarily from the GATE user manual, gate.ac.uk/sale/tao, and GATE training materials,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
Part 1 The Basics of Information Systems. Purpose of Information Systems Information systems ◦ Collects, stores and organizes information ◦ Retrieves.
University of Sheffield, NLP Introduction to Text Mining Module 4: Development Lifecycle (Part 1)
Speech Capture, Transcription and Analysis App
Extracting Recipes from Chemical Academic Papers
Presentation transcript:

Information Extraction From Medical Records by Alexander Barsky

Current Methodology: Broad assessment of patient contained in beginning of chart with references to more specific areas. Specific divisions follow broad assessment. Records are listed in chronological order of activity.

Chart Example:.

Problem: A patient's medical chart is very detailed and very complex in nature. Any attempt to quickly locate specific information will be met with frustration.

Example:.

Solution: Create a system that properly extracts wanted information based on a predefined set of parameters. Example: "Hormonal imbalance during puberty". Retrieve all references to hormonal imbalances but only between two specific time periods in medical chart.

Tool At our disposal: JAPE : Java Annotation Patterns Engine. Use : pattern matching and semantic extraction GATE : General Architecture for Text Engineering. Use: Information Extraction, document annotation, and XML output. C# : Visual C# Winforms. Use: Medium for conversion between XML and.csv file formats.

Solution Methodology: 1. Create corpus of documents in GATE. 2. Introduce rules for information extraction. 3. Annotate documents in corpus. 4. Output annotated documents in XML. 5. Strip file of unnecessary elements and convert to.csv.

ANNIE A-Nearly-New-Information-Extraction-System -Tokeniser - splits sentence into simple tokens -Gazetter - identify entity names contained in lists -Sentence Splitter - splits text into sentences based on lists. -Parts of Speech Tagger - identifies text as different POS. -Coreference Matcher- identifies relationships between previously defined entities.

Success in Information Extraction is based on integrating most if not all ANNIE components -

JAPE : Key to Extraction -

JAPE Example -

XML Output: -

Problem: Too much unorganized information. Solution : XLST to the rescue!!! XLST - Extensible Stylesheet Language Transformations - Add specific rules to seperate needed from unnecessary information.

XLST Example -Find all the nodes within the. Add string between the tags.

CSV File Type Comma Seperated Value - Used to present information in a tabular system. Useful for analyzing large amount of data in an easy to understand format. Most common program to use it is Excel..

Potential Problem: Regardless of how well all the ANNIE tools are utilized and how well the JAPE rules are defined, proper recall precentage won't ever be exact.

Solution: Machine Learning Machine learning is our best chance to increase precision of output results. Training a computer to recognize commonally used reporting phraseology will organize extraction better with more precise, concise outputs. Lucky for us, GATE include plugins to program machine learning.