Taming EHR Data Using Semantic Similarity to Reduce Dimensionality

Slides:



Advertisements
Similar presentations
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
Advertisements

Grid Security/Edinburgh 5 th & 6 th December 2002 Confidentiality, Consent & Access Peter Singleton - Cambridge Health Informatics.
Telephone based self-management support for vascular conditions via non-healthcare professionals: a systematic review and meta-analysis Dr Nicola Small,
Bakheet Aldosari, Ph.D. Health 305 Health Information Management Bakheet Aldosari, Ph.D.
Industrial Careers Expo Dr Heather Bryson 10 October 2012 University of Sheffield.
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
HSCIC Data Dictionary for Care Modelling Approach Dr. Rahil Qamar Siddiqui Health and Social Care Information Centre, NHS, England.
Primary care of hypertensive patients and the risk of acute events Irina Stirbu-Wagner Markus MJ Nielen Maaike Langelaan Robert A. Verheij Joke C. Korevaar.
1 Knowledge Management for Disease Coding (KMDC): Background & Introduction Timothy Hays, Ph.D. Project Manager, Knowledge Management for Disease Coding.
HYGIA Design and Application of new Artificial Intelligence techniques to the acquisition and use of medical knowledge represented as care pathways.
FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION
Management of Communication and Information Chapter -MCI
Paul Duckett Brixham Environmental Laboratory Site Manager.
The Nuffield Council on Bioethics Report : The collection, linking and use of data in biomedical research and health care: ethical issues. Martin Richards.
EHealth Strategy Office MedInfo 2013 Copenhagen August 21, 2013 Clinical Simulations in Health Professionals Education: EMR Training in UBC Faculty of.
September 27, 2012 THE FLOW OF DATA. The Flow of Data Data sources Data streams Databases Data repositories Data warehouses.
Evaluating the quality of care for patients with type 2 diabetes using the electronic medical record information in Mexico 1 Epidemiology and Health Services.
Dimacs Graph Mining (via Similarity Measures) Ye Zhu Stephanie REU-DIMACS, July 17, 2009 Mentor : James Abello.
Ministry of Health and Population Preventive and Primary Health Care Sector Ministry of Health and Population Preventive and Primary Health Care Sector.
1 CSE 2102 CSE 2102 Ph.D. Proposal A Process Framework For Ontology Modeling, Design, And Development Realized By Extending OWL and ODM Candidate: Rishi.
® From Bad to Worse: Comorbidities and Chronic Lower Back Pain Margaret Cecere JD, Richard Young MD, Sandra Burge PhD The University of Texas Health Science.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Patient Safety and Clinical Prediction.
Missed opportunities mapping: computable healthcare quality improvement Benjamin Brown Trainee General Practitioner and PhD student Richard Williams, John.
IBM Research – China, 2013 Mining Information Dependency in Outpatient Encounters for Chronic Disease Care Wen Sun, Weijia Shen, Xiang Li, Feng Cao, Yuan.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Health Informatics its role, importance and a historical perspective (why should I care ?) David Parry, School of Computing, AUT.
Modular Chronic Disease Registries for QI and Comparative Effectiveness Research Keith Marsolo, PhD Assistant Professor Division of Biomedical Informatics.
Harnessing Clinical Terminologies and Classifications for Healthcare Improvement Janice Watson Terminology Services Manager 11 th April 2013.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Connected health: collaborative opportunities.
© 2009 The McGraw-Hill Companies, Inc. All rights reserved. 1 McGraw-Hill Chapter 2 The HIPAA Privacy Standards HIPAA for Allied Health Careers.
Panel: Problems with Existing EHR Paradigms and How Ontology Can Solve Them Roberto A. Rocha, MD, PhD, FACMI Sr. Corporate Manager Clinical Knowledge Management.
C HAPTER 34 Code Blue Health Sciences Edition 4. Confidentiality of sensitive information is an important issue in healthcare. Breaches of confidentiality.
International Tax & Global Mobility: Improving Employee Experience & Maintaining Corporate Tax Compliance Helen Bradshaw, Global Share Plan Manager – AstraZeneca.
[this box will not print/ display] Please note: NIHR Brand Guidance requires you to use their logo as follows: Slides that are:  Introductory should use.
Topic 3A SEMANTIC INTEROPERABILITY: REUSE OF EHR DATA Mats Sundgren.
Quality Metrics - Gain or Pain
Reduce & Repeat Non-Clinical Statistics Conference 2014, Brugge October 2014 More Precise XC50s Using Fewer Wells (in vitro) and Fewer Animals (in vivo)
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Development and preliminary validation of a dynamic, patient-tailored method to detect abnormal laboratory test results PhD Student: Paolo Fraccaro MEng.
Amar K. Das, MD, PhD Associate Professor of Biomedical Data Science, Psychiatry and Health Policy & Clinical Practice Geisel School of Medicine at Dartmouth.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
24 Nov 2007Data Management and Exploratory Data Analysis 1 Yongyuth Chaiyapong Ph.D. (Mathematical Statistics) Department of Statistics Faculty of Science.
Infusion Kit compatibility James Baker - CHAOTIC 2 nd November
Contextual Text Cube Model and Aggregation Operator for Text OLAP
SAGE Nick Beard Vice President, IDX Systems Corp..
Table of Contents. Lessons 1. Introduction to HIPAA Go Go 2. The Privacy Rule Go Go.
N VISUAL ANALYTICS FOR HEALTHCARE: BIG DATA, BIG DECISIONS David Gotz Healthcare Analytics Research Group IBM T.J. Watson Research Center.
Visual analysis for Type 2 Diabetes Mellitus —based on electronic medical records Xi Meng Jijiang Yang.
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
SAMBa ITT – Reducing the Size of Clinical Trials Jonathan Bartlett SAMBa ITT 4 Strictly Confidential 31 st May 2016.
A ssociation of Public Health Observatories Hospital Activity data Roy Maxwell SWPHO & Bristol University Dr Richard Wilson Sandwell PCT.
Introduction to Health Informatics Leon Geffen MBChB MCFP(SA)
SAMBa ITT – Portfolio Management Alun Bedding SAMBa ITT 4 Strictly Confidential 31 st May 2016.
Scottish National Burden of Disease, Injuries and Risk Factors study:
T.Vasilopoulos1, C.Tatsi1, C. Lionis1
Introduction to the Health Record
REDCap General Overview
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
Research Questions Does integration of behavioral health and primary care services, compared to simple co-location, improve patient-centered outcomes in.
Dr Gayan Perera Epidemiologist
Algorithms for Big Data Delivery over the Internet of Things
Fenglong Ma1, Jing Gao1, Qiuling Suo1
8_84 Apollo Sugar Clinic: Diabetes surveillance campaign Screening Population Based on Gender Apollo Sugar holds the Proprietary right for the content.
8_84 Apollo Sugar Clinic: Diabetes surveillance campaign Screening Population Based on Body mass Index Apollo Sugar holds the Proprietary right for the.
Allscripts EHR: comprehensive solutions
8_84 Apollo Sugar Clinic: Diabetes surveillance campaign Screening Population Based on Age Apollo Sugar holds the Proprietary right for the content presented.
Digital Biomarkers – Patient data mining & precision medicine Stefan Schulz, Medical University of Graz Donausymposium Vienna, March 14, 2018.
A Data Reconstruction Algorithm for Temporal Clinical Expressions Zhikun Zhang, BS1,2, Chunlei Tang, PhD3,4,5, Meihan Wan, BS1,2, Joseph M. Plasek, PhD3,
8_84 Apollo Sugar Clinic: Diabetes surveillance campaign Screening Population Based on RBS Apollo Sugar holds the Proprietary right for the content presented.
Presentation transcript:

Taming EHR Data Using Semantic Similarity to Reduce Dimensionality Jim Weatherall, PhD Head, Advanced Analytics Centre, AstraZeneca Visiting Lecturer, School of Computer Science, University of Manchester 14th World Congress on Medical & Health Informatics, August 2013, Copenhagen On behalf of the authors: Leila Kalankesh, School of Computer Science, UoM James Weatherall, AstraZeneca Thamer Ba-Dhfari, School of Computer Science, UoM Iain Buchan, Institute of Population Health, UoM Andy Brass, School of Computer Science, UoM Brief: Talk for 14 mins, then 2 mins questions, then 2 mins switchover

Problems with mining healthcare data Introduction Problems with mining healthcare data Large collections not easily visualised or interpreted Read Code Rubric C10F. Type II Diabetes Mellitus, 1372. Trivial smoker < 1 cig/day bd3j. Prescription of “Atenolol 25mg tablets” G20. Essential hypertension 2469. Measurement of Diastolic Blood Pressure 246A. Assessment of Diastolic Blood Pressure Research not primary purpose for collection 10s of 1000s of dimensions 100s of 1000s of codes J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

The Salford Integrated Record (SIR) Data The Salford Integrated Record (SIR) Population ~220,000 Integrated primary and secondary care information Individual Read Code entries captured in primary care information systems Codes for diagnosis Codes for procedures All clinical transactions in primary care and some in secondary care Data extract for this analysis based on: GP data in date range 2003-2009 Containing 136M Read code entries Selected 24K patients with chronic conditions Containing 443K Read code entries Type 1 DM Type 2 DM MI Angina Stroke / CVA TIA CKD Liver disease J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

Measure ontological distance? Methods Semantic Similarity ? How alike are the meanings of two terms? Measure depth? Or not? Measure ontological distance? Also: Relative depth Multiple inheritance Lateral connections These are edge-based, what about node-based All this is ontology-based, what about corpus based? J.Weatherall | August 2013 From Sanchez, J.Biomed.Inform, 2011 Biometrics & Information Sciences | GMD

Semantic Similarity Method Methods Semantic Similarity – which method? An ontology of methods! Semantic Similarity Method Ontological Node-based Edge-based Hybrid Corpus-based Frequency Context Proximity Combined Corpus-based is realistic and grounded in data, but relies on having a large enough corpus, and can also be biased & computationally expensive Ontology-based is often computationally lighter, and rooted in the knowledge domain of interest, but can lack realism due to lack of use of real data J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

Semantic similarity calculation The Resnik measure Term probability, based on frequency, including descendants and annotations 1 2 Log transformation, gives “Information Content” 3 IC of “Most Informative Common Ancestor” gives similarity measure N = total number of all codes in data set c ϵ codes(c) = count of all instances of code, as well as annotations and descendents P. Resnik, “Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language”, J Artif Intell Res, 1999 J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

Stepwise approach to dimensionality reduction Analysis Plan Stepwise approach to dimensionality reduction Map patient records from diagnosis space into a similarity space 1 Map patient records into a low-dimensional vector space via PCA 2 Project patient records onto low-dimensional vector space and cluster patients by similarity 3 J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

“The Similarity Matrix” Analysis – Step 1 Mapping from diagnosis space to similarity space p1 p2 … pn sim(p1,p1) sim(p1,p2) sim(p1,pn) sim(p2,p1) sim(p2,p2) sim(p2,pn) sim(pn,p1) sim(pn,p2) sim(pn,pn) “The Similarity Matrix” pi = patient i sim(pi,pj) = similarity score between patients i and j J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

Analysis – Steps 2 + 3 PCA on the similarity matrix, visualisation & clustering Natural co-morbidity: Diabetes is a risk factor for angina due to its accelerating effect on atherosclerosis J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

Discussion & Conclusion Review & Outlook Patients with similar diagnosis codes are grouped together Therefore, the semantic similarity technique works, to some degree Therefore, this is a viable route to dimensionality reduction in complex healthcare data sets Exploring co-morbidity and co-treatment effects? New biomedical hypotheses? Transferability of method? Population level characterisation? New data mining paradigms? J.Weatherall | August 2013 Biometrics & Information Sciences | GMD

Thank You!

Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 2 Kingdom Street, London, W2 6BD, UK, T: +44(0)20 7604 8000, F: +44 (0)20 7604 8151, www.astrazeneca.com J.Weatherall | August 2013 Biometrics & Information Sciences | GMD