Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23

Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23
Deep recurrent neural networks identify transgender patients Oral Presentations – Methods for Identification, Classification, and Association using EHR Data S23 Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23

Disclosure I have no relevant relationships with commercial interests to disclose. AMIA | amia.org

Learning Objectives After participating in this session the learner should be better able to: Conceptualize a recurrent neural network text classifier, and see how it can be applied to transgender patient classification. Understand the need for data-driven methods to improve healthcare for transgender patients. Understand that deep learning models do not address the ethical issues presented by tasks such as transgender status classification. AMIA | amia.org

The transgender health crisis
Transgender individuals experience unique health disparities Lacking adequate subpopulation research due to historical stigmatization Health care professionals often untrained in LGBT health Specific physical and psychological comorbidities more common It is challenging to identify retrospective transgender cohorts ‘Transgender’ often not coded in health information systems Fear of stigmatization may lead to lack of disclosure Increased privacy concerns, particularly regarding EHR data Institute of Medicine (US). National Academies Press (US);2011. (PMID: ) Kenrick approached us Based on some of his previous work AMIA | amia.org

Recurrent Neural Networks
Accepts an ordered sequence as input In our case, a sequence of embedded words Returns a sequence as output For sequence classification, discard all but the last item in the output sequence What is a neural network? AMIA | amia.org

Vectorizing words via embedding
Mikolov, T et. al. NIPS. 2013;23: 2 primary benefits: - reduces dimensionality - models correlations between words AMIA | amia.org

Note classification pipeline
AMIA | amia.org

Implementation LSTM network written in Keras (Tensorflow back-end)
Embedding layer  LSTM layer  Fully connected layer Embedding dimensionality: 64 LSTM output dimensionality: 100 Activation functions: LSTM layer: Hard sigmoid Fully connected layer: Sigmoid 578,101 free parameters Trained on CentOS Linux server with 4x Nvidia Tesla P100 GPU Accelerators 14,336 total CUDA cores AMIA | amia.org

Implementation Targets Inputs AMIA | amia.org

Results: Cohort and note characteristics
EHR Cohort Cases: 39 manually-identified transgender patients Controls: 400 randomly selected patients with clinical notes Free-text clinical notes Obtained all notes for included patients Tokenized; removed numbers, proper nouns, punctuation Left-pad/truncate notes to 1000 words 33/67% train-test split Each patients’ notes in either train or test set, never both Train word2vec embeddings on entire set of notes AMIA | amia.org

Results: Classifier performance
Metric Score Accuracy 0.901 Precision 0.830 Recall 0.737 F1-Score 0.780 AUC ROC 0.940 AMIA | amia.org

Results: Word embeddings
AMIA | amia.org

Results: Accuracy and training loss
Training epoch: 1 2 3 4 5 Model trains (5 epochs) in minutes AMIA | amia.org

Comparison to stroke classification
Acute ischemic stroke Metric Score Accuracy 0.901 AUC ROC 0.940 AMIA | amia.org

Limitations and future improvements
We need far more data! 37 patients so far–we must be overfitting How do we find more patients? grep approach is primitive Leverage emerging techniques to extract knowledge from the learned networks Neural networks are hard to introspect; no “beta coefficient” equivalent Eventually, incorporate into clinical decision support See a clinical note, evaluate, trigger alert if likely transgender AMIA | amia.org

Application to multiple institutions
Does our model translate to other hospital systems? If not, how about the word embeddings? Major opportunity to improve training data size issues Different institutions/EHR systems implement gender differently NYP/Weill Cornell Medical Center: patient-reported gender with transgender options Stuck in IRB purgatory Use cutting-edge techniques to advance privacy guarantees Generative Adversarial Networks and/or Variational Autoencoders Differential privacy analysis AMIA | amia.org

Ethical considerations
Essential to address the ethical concerns associated with automated identification of transgender patients Misuse could lead to patient discrimination Reidentification of training patients may be possible Gender is complicated, and imposing labels on patients may be counterproductive See S57: Oral Presentations, first presentation: “The Use of Informatics to Reduce Disparities in Transgender Health” (Kenrick Cato, PhD, RN) 8:30 AM-8:48 AM; Tuesday (Fairchild) Cato, K et. al. J Empir Res Hum Res Ethics. 2016;11(3): AMIA | amia.org

Acknowledgements Tatonetti Lab Kenrick Cato, PhD*
Nicholas Tatonetti, PhD* Rami Vanguri, PhD* Kayla Quinnies, PhD Theresa Koleck, PhD Yun Hao Phyllis Thangaraj Alexandre Yahi Fernanda Polubriaginof, MD Nick Giangreco Jenna Kefeli Jing Ai Katie LaRow Kenrick Cato, PhD* *Coauthors AMIA | amia.org

AMIA is the professional home for more than 5,400 informatics professionals, representing frontline clinicians, researchers, public health experts and educators who bring meaning to data, manage information and generate new knowledge across the research and healthcare enterprise. AMIA | amia.org

Email me at: jdr2160@cumc.columbia.edu
Thank you! me at:

Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23

Similar presentations

Presentation on theme: "Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23

Similar presentations

Presentation on theme: "Joseph D. Romano, MPhil Columbia University Twitter: #AMIA2017 #S23"— Presentation transcript:

Similar presentations

About project

Feedback