Semantic natural language understanding at scale using Spark, machine-learned annotators and deep-learned ontologies David Talby @davidtalby Claudiu Branzan.

Slides:



Advertisements
Similar presentations
NLP Highlights GS Savova And team. Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
1 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference.
Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Survey of Semantic Annotation Platforms
Information Extraction From Medical Records by Alexander Barsky.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Open Health Natural Language Processing Consortium (OHNLP)
Ontology-Based Information Extraction: Current Approaches.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
MedSearch Vaishnav Janardhan COMS E6125 Web-Enhanced Information Management.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Intranet Technology in Hospital Information Systems James J. Cimino, M.D. Department of Medical Informatics Columbia University.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
IDS 1 Extended Keyword Index & Improved Search for Semantic e-Catalog 이동주.
Digital libraries and web- based information systems Mohsen Kamyar.
Mayo cTAKES: UIMA Type System
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
MedKAT Medical Knowledge Analysis Tool December 2009.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)
Kaiser Permanente Convergent Medical Terminology (CMT) Using Oxford RDFox and SNOMED for Quality Measures.
Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP)
Open Health Natural Language Processing Consortium
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
University of Sheffield, NLP Introduction to Text Mining Module 4: Development Lifecycle (Part 1)
The Complete Health History QUESTIONS ????????????????
Machine Learning Best Practices with Alfresco & Activiti
WELCOME TO SHIPLEY MEDICAL PRACTICE
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Phenotyping youth depression
Semantic natural language understanding with Spark, UIMA & machine-learned ontologies David Claudiu
How to Use the Nursing Diagnosis Handbook: A Guide to Planning Care
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Taking a Tour of Text Analytics
Kaiser Permanente Convergent Medical Terminology (CMT)
M. Lee Chambliss MD MSPH Associate Professor
DATABASE SEARCH & REVIEW GETTING STARTED GUIDE FOR EMIS WEB USERS
Search Engine Architecture
Build smarter bots and devices by connecting to the Microsoft Graph
RAI and MDS Chapter 16 Red book.
cTAKES: Demo Clinical Text Analysis and Knowledge Extraction System
Electronic Health Records
EHR System Function and Information Model (EHR-S FIM is based on EHR-S FM R2.0) CP.3.3 Manage Clinical Documents and Notes aka DC in EHR-S FM.
EHR System Function and Information Model (EHR-S FIM is based on EHR-S FM R2.0) CP.1.2 Manage Allergy, Intolerance and Adverse Reaction List aka DC
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
CSE 635 Multimedia Information Retrieval
, editor October 8, 2011 DRAFT-D
CS246: Information Retrieval
Search Engine Architecture
CERNER MILLENNIUM Abstracting Chart Data Into Unity: Immunizations Excerpt This presentation reviews strategies for moving immunization data from legacy.
Information Retrieval and Web Design
Introduction to Semantic Web Design
Introduction to Sentiment Analysis
Flu vaccine is free for anyone, six months of age and older, who live, work or attend school in Ontario. The Flu and You October 2019.
Presentation transcript:

Semantic natural language understanding at scale using Spark, machine-learned annotators and deep-learned ontologies David Talby @davidtalby Claudiu Branzan @melcutz

The problem Who is at risk for sepsis? Who needs to be vaccinated? Who fits this clinical trial? Who on this protocol did not have this side effect? Who is getting meds they’re allergic to?

At the beginning, there was search Scalable & robust Indexing pipeline Tokenizers & analyzers Synonyms, spellers & Auto-suggest File formats & header boosting Rankers, link & reputation boosting

Then there was semantic search “cheap red prom dresses” “laptops under $500” “italian restaurants near me that deliver” “captain america civil war tonight” “nba scores” Dictionary Based Attribute Extraction Dell - XPS 15.6 4K Ultra HD Touch-Screen Laptop - Intel Core i5 - 8GB Memory - 256GB Solid State Drive - Silver Machine Learned Attribute Extraction If you go for the ambience, you'll be disappointed. If you go for good, inexpensive and authentic Mexican food, then you're in the right place.

And Then, you need to understand language Prescribing sick days due to diagnosis of influenza. Positive Jane complains about flu-like symptoms. Speculative Jane may be experiencing some sort of flu episode. Possible Jane’s RIDT came back negative for influenza. Negative Jane is at high risk for flu if she’s not vaccinated. Conditional Jane’s older brother had the flu last month. Family history Jane had a severe case of flu last year. Patient history

Language gets complex & domain specific Joe expressed concerns about the risks of bird flu. Nothing Joe shows no signs of stroke, except for numbness. Double Negative Nausea, vomiting and ankle swelling negative. Compound Patient denies alcohol abuse. Speculative Allergies: Penicillin, Dust, Sneezing. Compound (it gets worse – in reality a lot of text isn’t valid English)

Let’s build this! The input The processing The output (patient records) The processing framework The output The query engines ASSERTIONS ASSERTIONS

NAMED ENTITY EXTRACTION SENTENCE DETECTION TOKENIZER LEMMATIZER STOPWORD REMOVAL NAMED ENTITY EXTRACTION PART OF SPEECH TAGGING NEGATION DETECTION DOCUMENT CLASSIFICATION MEDICAL CONCEPT EXTRACTION

First Demo: quick primer on spaCy

Machine learned annotators Sometimes, it’s easier to just code an annotation’s business logic Grammatical Patterns If … then … Direct Inferences Age < 18 ==> Child Lookups RIDT (lab test) But sometimes it’s easier to learn it from examples: Under-diagnosed conditions Flu Depression Implied by Context relevant labs normal

Second Demo: Machine Learned Annotator using Tensorflow

What about expanding & updating ontologies? Word2Vec

Let’s build this too! Ontology

Third Demo: Ontology Enrichment

Summary & applications Who is at risk for sepsis? Who needs to be vaccinated? Who fits this clinical trial? Who on this protocol did not have this side effect? Who is getting meds they’re allergic to?

Free notebooks: https://github.com/melcutz/NLP-demo-2017 @melcutz @davidtalby

In case the live demo gets cold feet on stage appendix In case the live demo gets cold feet on stage