SQL for Predicting from Likelihood Ratios

SQL for Predicting from Likelihood Ratios
Farrokh Alemi, Ph.D. This section provides the SQL for how to predict from the calculated likelihood ratios. In this section we assume that you have thousands of predictors and for each predictor you have calculated a likelihood ratio. This brief presentation was organized by Dr. Alemi.

Random Validation Test
Likelihood ratios are estimated from training set and predictions are made in validation set. These two sets of data are randomly chosen.

This process is called cross-validation
This process is called cross-validation. Typically, 5 fold cross-validations is done. The analysis is done 5 times, each time randomly setting side 1/5 of the data for validation. The reported accuracy is the average across these five sets. Cross-validation protects against modeling noise in the training set. As the number of predictors increases, the chance of modeling noise in the training set increases. Since we have thousands of predictors, chance of modeling noise is large; it is important to cross-validate the predictions.

WHERE RAND(seed)<.8 Row Number or ID as Seed
In SQL, random numbers have seed values. If the seed value does not change, the same random digit will be generated. One way to randomly select patients to be included in the training and validation set is to use their ID. One must first convert the ID to a number. If patients are unique, then row numbers can be used as seed for the random number generator.

Detection or Prediction
We rely on diagnoses to predict outcomes. In electronic health records, one has access to diagnoses before and after observing the outcome. Some diagnoses occur before and some after. This is not of concern if our outcome is mortality, no diagnoses (with exception of autopsy reports) occur after the patient has died. A critical question in use of predictive modeling is whether the predictors (the patients’ diagnoses) should be limited to before the observation of the outcome. Many statisticians advise that in multivariate analysis, independent variables should occur before the outcome of interest. This is not the case when it comes to Multi-Morbidity indices.

Keep in mind that a likelihood ratio measures the impact of a diagnosis on the outcome. These ratios do not distinguish whether the diagnosis has occurred before or after the outcome. In this sense, likelihood ratios are measures of associations. They show the association between the diagnosis and outcome. A strong likelihood ratio does not imply that the disease causes the outcome. It simply is a measure of association between the diagnosis and the outcome.

Detection Prediction Look back from later events
Future events can be predictors e.g. predicting undiagnosed diabetes from its complications Detection Look forward only No future events as predictors e.g. predicting prescription abuse from multiple surgeries Prediction Past OK Future In using the likelihood ratios, there are two ways to use likelihood ratios. In one approach, one tries to detect an event that has already occurred but perhaps not reported. For example, one might want to detect if the patient has an undiagnosed diabetes condition or has an unreported substance abuse disorder. In another approach, one tries to predict an event that has not yet occurred. For example, one might want to predict if certain patients will, in the future, abuse pain medications. These two approaches differ in what variables they use for predictors. The detection approaches can rely on association between consequences of the outcome. For example, it can rely on repeated skin aberrations or on repeated infections to detect injection of drugs. Here the diagnoses are occurring after substance abuse, a particular pattern in these diagnoses points to the existence of substance abuse earlier. The prediction approach is different. In forecasting, one must only use the information that is available. Hence one should rely on events that preceded the outcome; consequences are no longer a reasonable predictor. For example, repeated prescription of opioids for surgical pain increases the risk of opioid or prescription abuse in the future. For another example, borderline A1c levels increase the risk of diabetes. In deciding which variables to include as predictors in a multi-morbidity model, the first task is to decide if one is predicting or detecting. Of course, if the outcome of interest is mortality, the choice is simple: we are predicting mortality; no point to try to detect it. In other conditions, e.g. predicting diabetes, one set of predictors are useful for detection and another set are useful for prediction.

Multiple Clues, One Prediction
In practice, we rely on thousands of clues to predict the outcome. An easy procedure is needed to understand the combined effect of all of the various clues in the patient’s medical history.

The odds form of Bayes formula guides how the likelihood ratios of various diagnoses are used to predict mortality.

We calculate the change in posterior odds of the outcome.

This is calculated as the product of the likelihood ratios for diseases within the patient’s medical history.

-- Calculate Probability of Mortality Assuming Equal Priors SELECT Id
, EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) as Odds , EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) / (1+EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR)))))) AS Prob INTO #Predict FROM #history a left join dbo.LR b ON a.icd9 = b.icd9 WHERE LR<>0 GROUP BY id This code snippet shows how the calculation can be done.

, EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) as Odds , EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) / (1+EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR)))))) AS Prob INTO #Predict FROM #history a left join dbo.LR b ON a.icd9 = b.icd9 WHERE LR<>0 GROUP BY id First one selects the medical history of the patient. For each diagnosis one looks up the equivalent code,

, EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) as Odds , EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) / (1+EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR)))))) AS Prob INTO #Predict FROM #history a left join dbo.LR b ON a.icd9 = b.icd9 WHERE LR<>0 GROUP BY id if none exists then a LR of 1 to 1 is assigned.

, EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) as Odds , EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) / (1+EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR)))))) AS Prob INTO #Predict FROM #history a left join dbo.LR b ON a.icd9 = b.icd9 WHERE LR<>0 GROUP BY id Second, these likelihood ratios are multiplied to get the change in posterior odds of mortality. A separate set of slides describes why we multiply in this fashion, taking exponential of sum of log of the values.

Not So Fast Several problems arise because of this code. Some of these problems are mechanical and can be solved easily and other problems are more fundamental.

First, the product of the likelihood ratios may exceed the largest precision or number allowed in the computer. Some patients have hundreds of diagnoses in their records and the product of these likelihood ratios could be a number too large. A simple iff statement can avoid this problem. When sum of the log of the values is too large, we can just replace the sum with a large number.

, EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) as Odds , EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR))))) / (1+EXP(SUM(LOG(ABS(IIF(LR is null, 1, LR)))))) AS Prob INTO #Predict FROM #history WHERE LR<>0 GROUP BY id Second, note that no predictor is allowed to have a likelihood ratio of zero. Log of 0 is infinity and not defined in the computer. A likelihood ratio of zero means that the outcome never happens, not even a minute chance of occurring, when the predictor is present. We controlled for this situation, when we created the likelihood ratios. We modified all likelihood ratios of zero to be a number close to zero but not zero, allowing minute chances.

Now let us consider more philosophical problems
Now let us consider more philosophical problems. The third problem refers to the fact that the estimated likelihood ratio contains a great deal of confounding. If a relatively benign condition tends to occur with a relatively deadly condition, then its likelihood ratio will be distorted. For example, hypertension occurs often in patients who later have heart attack, as a consequence of this concurrence the likelihood ratio for hypertension will be overstated. Procedures to remove confounding are available in several published papers. We recommend the use of Stratified Covariate Balancing in removing confounding as this approach is not parametric and can be done using SQL. Applying methods of removing confounding remains an active area of research. The other approach is to live with the overstated likelihood ratios. It is likely to be predictive even though people are not dying from hypertension but heart attack.

Likelihood Ratio Less than One Not Necessarily Protective
Fourth, since likelihood ratios are averages and not causal, bizarre situations may occur where a disease could have a likelihood ratio less than 1, suggesting that it reduces the risk of mortality. In reality, diseases do not really help patients. For the most part there is no such thing as a protective disease. The likelihood ratio is reducing the risk compared to prior odds of mortality, which reflects what happens on average. It does not really reduce risk of death if compared to a non-diseased patient. It is not really helpful to be sick.

The odds of the outcome is calculated as the product of the likelihood ratios associated with diseases in the patient’s Record* The odds of the outcome is calculated as the product of the likelihood ratios associated with diseases in the patient’s record. Well not always all the record. If predicting, then all diseases up to the observation of the outcome can be used. If detecting the problem, then diseases post outcome can also be used.

SQL for Predicting from Likelihood Ratios

Similar presentations

Presentation on theme: "SQL for Predicting from Likelihood Ratios"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SQL for Predicting from Likelihood Ratios

Similar presentations

Presentation on theme: "SQL for Predicting from Likelihood Ratios"— Presentation transcript:

Similar presentations

About project

Feedback