ProSanos Corporation Confidential and Proprietary Modeling and clustering disease progression for correlation with genetic and demographic factors Robert Kingan
ProSanos Corporation Confidential and Proprietary What is SSIFT? “To address […] common diseases, which include schizophrenia, depression, and breast cancer, it is essential to incorporate observations of the clinical progression of the disease to refine the definition of phenotype.” – Michael N. Liebman, U. Penn. Yes, but what is SSIFT? –SSIFT = Stratification and Synchronization Inference Technology
ProSanos Corporation Confidential and Proprietary What is SSIFT? Stratification: Dividing a patient population into groups which are meaningful for diagnosis, prognosis, treatment selection, or genotype- phenotype correlation. Synchronization: Recognizing a pattern of disease progression, regardless of disease stage for a particular patient.
ProSanos Corporation Confidential and Proprietary SSIFT overview Assumptions—what is SSIFT-able Other constraints on data selection Outline of technique –Identifying variables –Modeling disease progression –Parameterizing different models –Clustering patients by progression patterns –Interpreting the results
ProSanos Corporation Confidential and Proprietary Pattern of disease progression Time Disease marker initial value final value period of change
ProSanos Corporation Confidential and Proprietary SSIFT workflow Survey the data Select useful variables Fit disease progression models Construct feature vectors Assign feature weights Cluster weighted feature vectors Evaluate the clustering results Complete? No Yes
ProSanos Corporation Confidential and Proprietary SSIFT workflow SSIFT
ProSanos Corporation Confidential and Proprietary SSIFT curve types
ProSanos Corporation Confidential and Proprietary Converting parameters Logistic Constant Linear Early stable Late stable y* = population mean, t 1 =first time point, t n =last time point
ProSanos Corporation Confidential and Proprietary Modified Mahalanobis distance
ProSanos Corporation Confidential and Proprietary SSIFT workflow Survey the data Select useful variables Fit disease progression models Construct feature vectors Assign feature weights Cluster weighted feature vectors Evaluate the clustering results Complete? No Yes
ProSanos Corporation Confidential and Proprietary SSIFT workflow Survey the data Select useful variables Fit disease progression models Construct feature vectors Assign feature weights Cluster weighted feature vectors Evaluate the clustering results Complete? No Yes Correlate results with: demographic data genetic data
ProSanos Corporation Confidential and Proprietary Application of SSIFT to NIDDK About NIDDK SSIFT and transplant data Variable selection Modeling Results
ProSanos Corporation Confidential and Proprietary Candidate variables -Fetoprotein Albumin Alkaline phosphatase (AP) Bicarbonate Blood urea nitrogen (BUN) Calcium Creatinine clearance Cholesterol Chlorine Corrected PT control Creatinine Direct bilirubin FK506 level Glomerular filtration rate Gamma GTP Glucose Hematocrit (HCT) Hemoglobin CSA HPLC level Potassium CSA monoclonal level Sodium Platelet count Prothrombin time Part. thromboplastin CT Part. thromboplastin PT CSA RIA level SGOT (AST) SGPT (ALT) Total bilirubin CSA TDX level Total protein White blood cells (WBC) Weight in KG
ProSanos Corporation Confidential and Proprietary Selected variables VariableLog? ŜuŜu Weights ŜwŜw abm ASTYes APYes HemoglobinNo Total bilirubinYes PotassiumNo HematocritNo WBCYes BUNYes CreatinineYes SodiumNo
ProSanos Corporation Confidential and Proprietary Evaluating Kaplan-Meier curves ŜŜ
ProSanos Corporation Confidential and Proprietary Final selected variables Best pair: AST + AP, Ŝ=0.34 Best triple: AST + AP + hematocrit, Ŝ=0.42 No set of four variables exceeded Ŝ=0.42
ProSanos Corporation Confidential and Proprietary Survival by clustered SSIFT AST, AP and HCT parameters Ŝ = 0.42
ProSanos Corporation Confidential and Proprietary Cluster mean curves Best cluster Worst cluster
ProSanos Corporation Confidential and Proprietary SSIFT in Gene Discovery: Simulation Time Markers SSIFT™ Disease Genes Disease Progression Pattern Determine Analyze Discover
ProSanos Corporation Confidential and Proprietary Simulated data Marker Value (relative scale) Time (years)
ProSanos Corporation Confidential and Proprietary Clustering Results
ProSanos Corporation Confidential and Proprietary Nearest-Neighbor Analysis GeneGenotype for Nearest Neighbors, based on SSIFT Pattern C D J A J W C2353 L P R K K D S S B B S T C2353 is related to SSIFT pattern of disease progression (p< ).
ProSanos Corporation Confidential and Proprietary SSIFT: Stratification and Synchronization Inference Technology Discussion