Presentation is loading. Please wait.

Presentation is loading. Please wait.

JMP Discovery Summit 2016 Janet Alvarado

Similar presentations


Presentation on theme: "JMP Discovery Summit 2016 Janet Alvarado "— Presentation transcript:

1 Novel Multivariate Approach for the Assessment of Product Comparability
JMP Discovery Summit 2016 Janet Alvarado Center for Mathematical Sciences

2 Outline Introduction Current methods used to establish comparability
What is comparability? Regulations regarding establishing comparability, requirements Current methods used to establish comparability Overview of suggested approach Random Forest (RF) and the proximity matrix Case studies Demonstration Future work

3 Definition of Comparability
Comparable —used to say that two or more things are very similar and can be compared to each other Merrian-Webster dictionary online ICH Q5E (Biotechnological/Biological Products) : “The goal of the comparability exercise is to ascertain that pre- and post-change drug product is comparable in terms of quality, safety, and efficacy.” “The demonstration of comparability does not necessarily mean that the quality attributes of the pre-change and post-change products are identical; but that they are highly similar and that the existing knowledge is sufficiently predictive to ensure that any differences in quality attributes have no adverse impact upon safety or efficacy of the drug product.”

4 Current Pharmaceutical Industry practices to determine comparability
Univariate methods: Equivalence Differences Intervals Multivariate methods: PLS PCA Cluster analysis

5 Suggested Approach Overview
Multivariate approach, suggested as a multivariate exploratory tool for the assessment of product comparability Combination of well known multivariate methods: Random Forest (RF) -> Principal Coordinate Analysis (PCoA) Accommodates continuous and categorical predictors, responses with two or more levels, and a wide range of number of cases versus number of predictors Observations from different groups that lie close together in a plot of the principal coordinates are for the most part indistinguishable from one another Variable importance from RF analysis can be used to identify which single variables contribute the most to the separation between groups, which can then be used to determine if a more focused approach is needed Impact of these variables on product safety? On product efficacy?

6 Algorithm basis Model the data using Random Forest (RF) algorithm
(Breiman & Cutler, 2001, as implemented by Liaw & Wiener in R) Obtain dissimilarity matrix from the RF proximity matrix dissimilarity matrix, d = 1 – proximity matrix Perform classical multidimensional scaling, also known as Principal Coordinate Analysis (Gower, 1966), of the dissimilarity matrix Distances between pair of points are Euclidean Calculate density ellipses for each group on a plot of the first two Principal Coordinates Density ellipse based on bivariate normal distribution Make statement about similarity based on amount of overlap between pairs of ellipses

7 Comparison to other Multivariate Approaches
Current Discriminant Analysis PLS Predictor Variables Both Continuous and Categorical Continuous only (JMP) Response Variable Categorical Single response Continuous only Single or multiple Data structure Accommodates any kind of relationship among variables Requires assumptions be made with respect to within-group covariance Modeled relationships are linear Sample size N vs. No. of predictors k Handles k >> N Requires N > k Group size Increase in type I error with small group sizes Efficacy decreases with larger differences in group sizes Irrelevant Determination of Variable Importance Model is fit to maximize node purity Model is fit to maximize ‘separation’ of groups Model is fit to find direction of maximum correlation among predictors that explains the maximum variance in the response space Outliers Robust Greatly affected Variants exist that are robust

8 Advantages of Random Forest models
Handling of k >>N They do not expect linear features or even features that interact linearly. Handling of continuous and categorical variables Handling of missing values Robust to outliers Robust against overfitting Built-in cross-validation Disadvantages: Ability to extract linear combinations of features Interpretability

9 Random Forest model Bootstrap sample Bootstrap sample X2 < 0.92 ≥ 0.92 < 0.52 ≥ 0.52 X1 < 0.405 ≥ 0.405 ≥ 0.195 < 0.195 B C A < 0.745 ≥ 0.745 X1 X2 < 0.62 ≥ 0.62 < 1.495 ≥ 1.495 < 0.34 ≥ 0.34 < 0.745 ≥ 0.745 ≥ 0.195 < 0.195 C A B Each tree is built on a bootstrap sample of the data (training set), no pruning

10 Random Forest model Classification error is based on the OOB samples
4 14 10 12 16 6 7 12 9 4 X2 < 0.92 ≥ 0.92 < 0.52 ≥ 0.52 X1 < 0.405 ≥ 0.405 ≥ 0.195 < 0.195 B C A < 0.745 ≥ 0.745 X1 X2 < 0.62 ≥ 0.62 < 1.495 ≥ 1.495 < 0.34 ≥ 0.34 < 0.745 ≥ 0.745 ≥ 0.195 < 0.195 C A B Classification error is based on the OOB samples

11 RF model and Proximity Matrix
4 14 10 12 16 6 7 12 4 9 Proximity matrix, nxn n = number of cases Not in same node In same node X Not in OOB sample Proximity i, j : total number of times that cases i and j ended up in the same terminal node of a tree, normalized by the total number of trees in the forest

12 RF model and Dissimilarity Matrix
Dissimilarity matrix, d Freq in separate terminal node RF d i,j = 1 - proximity i,j Density ellipses highly overlapped, groups can be declared to be “similar” Principal Coordinates plot

13 Case Study 1: Iris dataset

14 Case Study 1: Iris dataset – Analysis Results
No overlap, Groups can be said to be different Total Variance explained by first 2 PCoA’s = 99.0%

15 Case Study 1: Iris dataset – Analysis Results

16 Case Study 1: Iris dataset - Comparison to DA and PLS

17 Case Study 2: Site Product comparison

18 Case Study 2: Site Product comparison – Analysis Results
L2 and L3 can be said to be similar, L1 is different from L2 and L3 Total Variance explained by first 2 PCoA’s = 70.1%

19 Case Study 2: Site Product comparison – Analysis Results

20 Case Study 2: Site Product comparison - Comparison to DA and PLS

21 Case Study 3: Raw Material Lot comparison

22 Case Study 3: Raw Material Lot comparison – Analysis Results
No overlap, Groups can be said to be different Total Variance explained by first 2 PCoA’s = 84.7%

23 Case Study 3: Raw Material Lot comparison – Analysis Results

24 Case Study 3: Raw Material Lot comparison - Comparison to DA and PLS

25 JMP Script link (placeholder)
Live Demonstration JMP Script link (placeholder)

26 Summary Suggested as a multivariate exploratory tool for the assessment of comparability Versatile tool Can be used to identify which variables contribute the most to separation between groups Prioritize resource allocation Good alternative to Orthogonal-PLS-DA

27 Future Work Sensitivity analyses Calculate amount of ellipses overlap
Develop ‘similarity’ criteria based on amount of overlap of the ellipses (highly similar - different) Mitigation of risk by defining weights for important variables depending on their potential to impact safety and/or efficacy

28 Questions? Acknowledgments Nelson L. Afanador, PhD
Andy Liaw, PhD and Matt Wiener, PhD (2002). Classification and Regression by randomForest. R News 2(3), CMS colleagues Questions?


Download ppt "JMP Discovery Summit 2016 Janet Alvarado "

Similar presentations


Ads by Google