Download presentation
Presentation is loading. Please wait.
Published byIngrid Kuntz Modified over 5 years ago
1
Leveraging Free Text Data for Decision Making in Drug Development
Yan Sun Jiyeong Jang Xin Huang Hongwei Wang Weili He 7/2019
2
Disclosure The support of this presentation was provided by AbbVie. AbbVie participated in the review and approval of the content. Yan Sun, Xin Huang, Hongwei Wang and Weili He are employees of AbbVie. Jiyeong Jang was an extern at AbbVie and is a student of University of Illinois at Chicago.
3
Data Types for Drug Development
Data Example Stats Role Genomic Proteomic Renal Blood parameters Activity tracking (spatial, temporal) Physician notes Social media IHC, H&E, MRI
4
Application of Natural Language Processing in Drug Development
When text is the main focus or role of text is clear: Extracting and normalizing adverse events Extracting patient reported outcomes/feed backs Patent/literature text mining for new targets Extracting real world evidence/EHR/physician notes, etc. When text is auxiliary information and role of text is unclear: Potentially useful for predicting response/adverse events Other variables/biomarkers are also available Structured Data Response Safety
5
Analysis Method for NLP (Traditional)
Training Training Set Validation Test Set Machine Learning Model Model Training Raw Text Labeling Pre-processing Feature Extraction Hyperparameter Tuning Model Selection Tokenization Filtering Stemming Lemmatization ….. Bag-of-words N-grams Word Embedding - Co-occurrence Matrix - Word2Vec ….. Evaluation Predicting New Text Predict Target Feature Extraction
6
Analysis Method for NLP (Deep Learning)
Training Automatic Feature Generation Training Set Validation Test Set Deep Learning Model Model Training Raw Text Labeling Pre-processing Hyperparameter Tuning Tokenization Filtering ….. Evaluation Predicting New Text Predict Target
7
Deep Learning for NLP “Old fashioned”: recurrent neural network
Hard to parallelize Having difficulty learning long-range dependencies State of art: BERT Bidirectional Encoder Representations from Transformers Transformer encoder based on self-attention to learn context Consider both left and right context in all layers Instead of either from left to right or combined left-to-right and right-to-left training Hidden State #0 Hidden States Input #1
8
A Little More about BERT
Model architecture Multi-layer bidirectional Transformer encoder Input representation Unsupervised pre-training Masked language model: predict masked tokens of the input using the final hidden vectors corresponding to the mask tokens Next sentence prediction: predict sentence pair using the final hidden vector corresponding to [CLS] Input Output Encoder Encoder Encoder Encoder Encoder Encoder Raw Input Token Embedding Add [CLS] for classification [SEP] for sentence separation WordPiece Tokenization Encoder Input Segment Embedding Position Embedding
9
Applying BERT when text is the main focus/role of text is clear
Fine tuning approach by swapping out input and output layer Feature based approach for token level tasks Extract fixed features (token representations) from pre-trained model BERT paper recommends a sum of the last 4 hidden layers Note that this works for token level task only [CLS] representation cannot be used in a similar way for general feature-based sentence level task [CLS] is pre-trained only for next sentence prediction Fine tuning all parameters Initialize parameters using pre-trained results Token level tasks Token representations Output layer Sentence level tasks CLS representations Output layer
10
Applying BERT when text is auxiliary information and role of text is unclear
Challenges/Considerations: How to label the text if the role/meaning of the text is unclear? Even if the role of text is clear, how to justify the labor-intensive text adjudication process if text is just a candidate predictor? How to save the computing resources and quickly evaluate the value of text as a candidate predictor? How to incorporate other variables/biomarkers of interest? When other variables/biomarkers are involved, how to interpret the result? Our preliminary thought: Feature (hidden layer) based method; Prediction first; Interpretation second No need to label/adjudicate before hand Computationally inexpensive Incorporates other biomarkers Easy to interpret (including text)
11
A Pilot Example Data Candidate biomarkers: BM1 – BM10 ~ N(0, 1) with CS covariance structure (cov = 0.2) One text predictor describing loneliness (*Source: Loneliness status (-1/1) for data generation only (not visible for model training) 1. We don’t know the label; 2. We don’t know it represents loneliness Sequence length ranges from 1 ~3 sentences Response Outcome: Logit(response rate) = BM1*2 + Loneliness.status *2 Training sample size: 300; Testing sample size: 300 Goal: Build interpretable model and predict response in the testing set
12
Proof of Concept (Training Process): Experimenting our preliminary thought on the pilot example
Feature based sentence level task using BERT: Note that [CLS] representation cannot be used We consider the following 3 quick options for sentence representation: Average pooling of all tokens Average pooling of all tokens except [CLS] Averaging pooling of all tokens except [CLS] and [SEP] Similar to token level task, sum of the last 4 hidden layers will be used BERT_large will be used Note: labeling (loneliness status) is not needed/used GLM LASSO is used for down-stream classification modeling Derived BERT features are added as individual variables (p = 1024) Note the cost of adding a text column = 1024 additional variables Total # of predictors: = 1034 Lambda.1se used for variable selection Model is re-fitted using selected variables in the training set
13
Proof of Concept: Result
Testing Set Performance (Trained with biomarkers only) Testing Set Performance (Trained with biomarkers + free text) The 3 options of sentence representation have similar performance in this pilot dataset
14
Some low score comments Some high score comments
Interpretation Since GLM is used, interpretation of biomarker variables is straightforward Interpretation of free text: We consider the following practice The examination result can serve as basis for future adjudication Calculate the text score based on GLM: 𝛽 𝑇 𝑋 𝑡𝑒𝑥𝑡 Order the text scores Examine the high and low scores Some low score comments Some high score comments I want to dance. Today was fantastic. Today was an awesome day Good day good vibes. I am feeling happy Got good news today I am positive about life I am awesome. I had a good day Today was fantastic The world looks good I am enjoying life. I had a good conversation with dad My family is always there for me. I can't wait to meet new people I hate being myself. The past haunts me. No one loves me I want to cut myself I wish I could talk with someone. I hate being myself I want my life to end today. I wish I could talk with someone. Everyone I know will go away I hate my life. I thought of suicide. I want my life to end today I celebrated my birthday alone. I drink alone. I hate being myself I eat alone. I thought of suicide. No one loves me I am lonely. I eat alone. I sit alone I hate everyone. I hate my life
15
Conclusion Free text is an important data type in the Pharma industry
Extracting useful information from free text is critical for decision making NLP is a hot topic and the methods/tools are evolving rapidly BERT is currently state of art method for NLP; It has great potential to be used as standalone method for free text or be combined with other biomarkers to achieve better prediction accuracy We proposed some preliminary idea on combining free text with biomarkers and has the following potential advantages: No need to label/adjudicate before hand Computationally inexpensive Incorporates other biomarkers Easy to interpret (including text) A better sentence representation could potentially further improve the performance
16
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.