ACM BCB 2015 Xun Lu 1*, Aston Zhang 1*, Carl A. Gunter 1, Daniel Fabbri 2, David Liebovitz 3, Bradley Malin 2 1 University of Illinois at Urbana-Champaign,

Slides:



Advertisements
Similar presentations
Nursing Diagnosis: Definition
Advertisements

Panel Identification Improvement Facilitator Training Session 1 Day 2.
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Validating EMR Audit Automation Carl A. Gunter University of Illinois Accountable Systems Workshop.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Machine learning continued Image source:
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Clinical decision support system (CDSS). Knowledge-based systems Knowledge based systems are artificial intelligent tools working in a narrow domain to.
POH/DMC UROLOGY Grand Round Conference Presented by: Spectrum Billing Technologies, LLC.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Three kinds of learning
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Improve accuracy of clinical coding
INTRODUCTION TO ICD-9-CM PART TWO ICD-9-CM Official Guidelines (Sections II and III): Selection of Principal Diagnosis/Additional Diagnoses for Inpatient.
IMPROVING THE DOCUMENTATION OF DIAGNOSES Carol A. Lewis.
NANDA International Investigating the Diagnostic Language of Nursing Practice.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DOCUMENTATION GUIDELINES FOR E/M SERVICES
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Driving the Transition to Individualized Cancer Treatment by Addressing Critical Questions 1 Do I need surgery? chemotherapy? radiation? INVASIVE BREAST.
Modeling and Detecting Anomalous Topic Access Siddharth Gupta 1, Casey Hanson 2, Carl A Gunter 3, Mario Frank 4, David Liebovitz 4, Bradley Malin 6 1,2,3,4.
1 Chapter 5 Unit 4 Presentation ICD-9-CM Hospital Inpatient, Outpatient, and Physician Office Coding Shatondra Surulere, MBA, RHIA, CCS.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Program Administrator Certification
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Guidance for Editors Working with care maps © 2008 Map of Medicine Ltd. Commercial and in confidence.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Unit 3.02 Understanding Health Informatics.  Health Informatics professionals treat technology as a tool that helps patients and healthcare professionals.
Basic Nursing: Foundations of Skills & Concepts Chapter 9
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
EXPECTED COMPETENCIES RELATED TO GENETICS AMONG BOARD-CERTIFYING ORGANIZATIONS Carrie A. Zabel, M.S. Certified Genetic Counselor Paul V. Targonski, M.D.,
Assessment tools MiniCEX, DOPS AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
DOCUMENTATION FOR MEDICAL STUDENTS Balasubramanian Thiagarajan.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Health Informatics Health Informatics professionals treat technology as a tool that helps patients and healthcare professionals Understand health.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied, duplicated, or posted to a publicly accessible website, in whole or in part.
IR 6 Scoring, term weighting and the vector space model.
3.02 Understand Health Informatics
Clinical Terminology and One Touch Coding for EPIC or Other EHR
3.02 Understand Health Informatics
Analyze ICD-10 Diagnosis Codes with Stata
3.02 Understand Health Informatics
3.02 Understand Health Informatics
3.02 Understand Health Informatics
The challenges for SIRs & Sepsis data capture and reporting in ICD-10-AM in HIPE 22/09/2018.
The Nursing Process and Pharmacology Jeanelle F. Jimenez RN, BSN, CCRN
S1316 analysis details Garnet Anderson Katie Arnold
3.02 Understand Health Informatics
ICD-9-CM and ICD-10-CM Outpatient and Physician Office Coding
iSRD Spam Review Detection with Imbalanced Data Distributions
Medical Insurance Coding
3.02 Understand Health Informatics
3.02 Understand Health Informatics
Low-Rank Sparse Feature Selection for Patient Similarity Learning
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

ACM BCB 2015 Xun Lu 1*, Aston Zhang 1*, Carl A. Gunter 1, Daniel Fabbri 2, David Liebovitz 3, Bradley Malin 2 1 University of Illinois at Urbana-Champaign, 2 Vanderbilt University, 3 Northwestern University Presented by Aston Zhang | Sep 11, 2015 * Equal contributors

 Medical specialties provide information about which providers have the skills needed to carry out key procedures or make critical judgments  However, organizing specialties into departments or wards has limitations  Some specialties may be lacking or inaccurate (they are not always entered for new hire documents)  Employees can change roles  Encoded departments do not always align with specialties  As a result, there could be a gap between diagnosis histories of certain providers and their specialties 2

 Providers select from Health Care Provider Taxonomy Code Set (HPTCS) when they apply for their National Provider Identifiers (NPI)  However, providers may not always choose their taxonomy codes based on the certifications they hold  National Plan & Provider Enumeration System does not verify the selected taxonomy code  Certain taxonomy codes do not correspond to any nationwide certifications that are approved by a professional board (e.g., Men and Masculinity)  Some national certifications are not reflected by the taxonomy code list 3

 As we have seen, there are limitations in purely relying on NPI taxonomy codes  Hence, we propose to leverage real-world diagnosis histories to infer and recognize actual specialties  De facto specialties are medical specialties that exist in practice regardless of the specialty codes (NPI taxonomy codes)  De facto diagnosis specialties are medical specialties that exist in practice and are highly predictable by the diagnoses inherent in the EHRs 4

 Urology is an example of diagnosis specialty as opposed to anesthesiology  It should be easier to characterize a urologist in terms of medical diagnoses for conditions, for example, of the kidney, ureter, and bladder  It should be harder to characterize an anesthesiologist, whose duties are more cross- cutting with respect to diagnoses, concerning essentially all conditions related to surgeries 5

 There is no ground truth to determine the validity of a discovered de facto diagnosis specialty  A discovered de facto diagnosis specialty can be recognized by classifiers as accurately as the existing listed diagnosis specialties 6

 The users (providers) can be likened to readers of documents, where there is an archive of documents in which the words in each document correspond to diagnoses  Users with specialties are groups of readers who have a common de facto diagnosis specialty and interest in the same group  To solve the de facto diagnosis specialty discovery problem we aim to develop a classifier that characterizes this common interest in terms of the documents that they have read 7

 We use access log data from a hospital and combine it with the diagnosis lists in patient discharge records  Fine-grained data set  A small portion of the data has an explicit mapping between users and diagnoses of the EHRs they accessed  General data set (more representative of the challenging scenarios encountered in practice)  The entire data after removal of all the fine-grained mapping information 8

9

 The ICD-9 codes for diagnoses are mapped down to 603 Clinical Classification Software (CCS) codes  NPI taxonomy codes with fewer than 20 user instances are filtered out  Based on the guidance of clinicians and hospital administrators, we further identify 12 NPI taxonomy codes as diagnosis specialties (core NPI taxonomy codes)  Obstetrics & Gynecology, Cardiovascular Disease, Neurology, Ophthalmology, Gastroenterology, Dermatology, Orthopaedic Surgery, Neonatal-Perinatal Medicine, Infectious Disease, Pulmonary Disease, Neurological Surgery, and Urology. 10

 We invoke machine learning to discover potential de facto diagnosis specialties in the data set that lack corresponding codes in the HPTCS  A semi-supervised learning model (PathSelClus) for fine-grained data set  An unsupervised learning model (LDA) for larger general data set  We use supervised learning models to evaluate the recognition accuracy of the discovered specialty by comparing our approach with the existing listed diagnosis specialties (12 core NPI taxonomy codes)  Ideally, their recognition accuracy should be similar  Such recognition accuracy is evaluated by four classifiers: decision trees, random forests, PCA-KNN, and SVM 11

 A heterogeneous information network consists of multiple types of objects and/or multiple types of links.  Link-based clustering in heterogeneous information networks groups objects based on their connections to other objects in the networks  Two meta-paths in our model:  User (access) -> Patient (accessed by) -> User  User (access) -> Patient (diagnosed with) -> Diagnosis (assigned to) -> Patient (accessed by) -> User 12

13 We analyze the semantics of the m unseeded clusters via the users they contain As an output, each user is assigned to the cluster Cluster all the users regardless of whether they have a taxonomy code Create empty clusters with N seeded ones and m unseeded ones

 In practice, fine-grained data sets may not be available for PathSelClus. Hence, we also employ LDA, an unsupervised learning method based on topic modeling  In LDA, topics act as summaries of different themes pervasive in the corpus and documents are characterized with respect to these topics  We associate users with diagnoses via the patients they access 14

 After applying LDA, each user is assigned to an allocation in the specialty topic simplex  A higher frequency in a specialty indicates that the user is more likely to access patients with diagnoses popular in that specialty  We cluster users by the closest specialties because this specialty has the highest proportion in the specialty topic simplex 15

 Features: we map each user to a term frequency-inverse document frequency (TF-IDF) weighted diagnosis vectors  TF: number of times that a user has accessed patients with a diagnosis  IDF: inverse of the number of users that have accessed patients with a diagnosis  Classifiers:  Decision trees  Random forests  KNN-PCA  SVM

It is represented by the top 10 most accessed diagnoses by all the users that are associated with the Breast Cancer specialty

They are represented by 10 most probable diagnoses respectively as an output of LDA

Evaluation of the Breast Cancer specialty discovered by PathSelClus Evaluation of the Breast Cancer specialty discovered by LDA Evaluation of the Obesity specialty discovered by LDA P: Precision, R: Recall, F1: F1 Score All values are in percentage Boldfaced results indicate significant improvement (5x2 cross-validation & paired t-test with p < 0.05)

 Medical specialties are useful, but are often inconsistent with actual diagnosis histories  Even National Provider Identifiers are not always accurate  Machine learning can be leveraged to discover and evaluate de facto diagnosis specialties, such as Breast Cancer and Obesity  Semi-supervised and unsupervised learning are used for discovery  Supervised learning are used for evaluation