Download presentation
Presentation is loading. Please wait.
Published byAndrew Morrison Modified over 9 years ago
1
Phenotype generation from EMR by tensor factorization SEDI Durham Cohort James Lu M.D. Ph.D. Department of Electrical and Computer Engineering Department of Medicine
2
3.2 Trillion / yr (~21% of GDP) Health System Under Pressure
3
Small Molecules, Medical Devices, Biologics, diagnostics, genomics, transcriptomics…. OperationsNovel technology Align incentives, risk sharing, quality metrics, reducing readmissions, six sigma/ lean, … Where do I achieve cost arbitrage? How do we identify which patients to study? Where is my patient going to do next? Can we reorganize patient flow?
4
Computable phenotypes are a top down process PheKB, Northwestern
5
Many variations of computable phenotypes require adjudication by physicians. Richesson, et al. 2013 Expensive and time consuming
6
EMR Data is large and Complicated Durham County, 2007-2011 Patient level >240,000 patients Birthday Death (where available) Gender Race Ethnicity Visit level 4.4 Million patient visits Average 18 measurements recorded per visit Indicator of presence/absence of particular diseases (computed) Encounter date (start, end) Location (DHRH, DUH, DRH) Path (ED -> inpatient for example) Inpatient / Outpatient > 60,000 types of observations CPT ICD9 diagnoses ICD9 procedures Lab values Medications Vitals Intervention level Caveats: Temporal gaps – People are only patients when they are sick We want to incorporate all of this information Don’t want to be fooled by mistakes and bias
7
Decompose each touch with the health care system into its parts ● Each visit is a 5-D tensor (~1 billion elements) ● Patient ● Diagnosis/ Billing Codes ● Labs ● Medications ● Time ● Model as Counts ● Decompose into set of K rank 1 vectors With Piyush Rai and Changwei Hui Codes Labs Medications Time
8
Computational phenotypes are a bottom-up process. Factors represent latent phenotypes Evaluate 11242 pts with ~23MM data-points with morbidity outcomes in diabetes Alprazolam Urate Factor 2 Factor 10 Malignant Neoplasm Prostate Clinical Trial Participation Secondary Malignant Neoplasms of Bone External Catheter Set CEA AG 15-3 Allopurinol Evening Primrose Oil Systemic Lupus Erythematosus Side Effects from Statins Shoulder Pain Calcidiol Jo-1
9
Patients are composites of common and rare latent phenotypes. ER/ EKG Standard Labs (i.e. CBC/ BMP) Kidney Disease Hypertension Surgical Patient Patient by Factor Score Matrix, 40 most common phenotypes
10
Compare Outcome prediction to Known Algorithm (UKPDS) UKPDS: UK Prospective Diabetes Study outcomes model used to predict MI, Death, and Stroke 7 demographic + lab variables: age, ethnicity, smoking status A1c, HDL, Total Cholesterol and Systolic BP Dataset Original 7 variable model All Data Non Matrix Factorization Tensor Factorization Can we predict outcome in next year Death AMI Stroke Classification Model: Fit data with Random Forests 10 fold cross validation With Joseph Lucas
11
Tensor derived factors performs better than original UKPDS in all outcomes, provides comparable performance to “all-data” model Stroke is similar to Dat
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.