Math 6330: Statistical Consulting Class 5

Math 6330: Statistical Consulting Class 5
Tony Cox University of Colorado at Denver Course web site:

What is a predictive model?
“The probability that X will happen is p” is a predictive model Must be able to decide whether X does happen. This is not always straightforward! Must define time frame, objective criteria for occurrence

What makes a predictive model good?
Calibration Accuracy For classifier: False positives, false negatives, true positives, true negatives Balanced accuracy Brier score Brier Score = Reliability – Resolution + Uncertainty

Brier Score Smaller is better
“Reliability” here would be better named “calibration error” The following page, added after class, contains further interpretation.

Predictive analytics: CARET framework
Partition the data into training and test sets Stratified random sampling, balanced samples For time series forecasting, use early periods to train Select predictive models to use Pre-process data Remove informationless (0-variance) and redundant variables Standardize predictors for some algorithms Fit/optimize each model using the training data Evaluate and compare the predictive performances of the models using the (disjoint) test data “Superlearning” then uses results to improve predictions yet further. Need multiple hold-out samples.

Software tools Windows Excel users may download Causal Analytics Toolkit (CAT) and Predictive Analytics Tookit (PAT) software for free here: Please follow instructions to install software Software is as safe as R, but not registered with M

Data partitioning Stratified randomized sampling vs. time series

Filtering and pre-processing for large data sets
Filter out relatively poor predictors Drop redundant and low-variance variables Standardize

Select predictive analytics algorithms
CART trees (rpart, ctree) Random Forest (rf) Multiple adaptive regression splines (MARS/earth) Gradient boosting Support Vector Machines (SVM) Artificial neural networks (ANNs) Many others! (Over 100 algorithms in CARET)

Outputs Confusion matrix Performance metrics ROC AUC
Comparative performance on cases Calibration curves (To be added: Brier scores)

Confusion matrix visualizations
Green = correct classifications Yellow = incorrect classifications

Performance metrics

ROC AUCs

Performance details

Calibration curves

Introduction to causal analytics

Causal analytics How do actions affect outcome probabilities?
Causal model: Pr(outputs | input actions) Pr(c | do(x)) Not BN inference, Pr(output | input observations) How will future consequence probabilities change if we make different choices?

Types of causality: Regularity
Causality as regularity: X is a cause of Y if occurrence of X is regularly succeeded by occurrence of Y. Counterexamples: Nictotine-stained fingers and lung cancer; elderly aspirin consumption and heart attacks

Types of causality: Association
Associational/attributive causality: X is likely to be a cause of Y if higher levels of X are strongly, consistently, and specifically significantly associated with higher levels of Y “Hill criteria” in epidemiology Relative risk > 2 is often cited Counterexamples: Simpson’s Paradox, aspirin

Types of causality: Predictive
Predictive causality: Causes help to predict their effects. X is identified as a (predictive) cause of Y in longitudinal observational data if and only if the past and present values of X provide information that can be used to help predict the future of Y better than the future of Y can be predicted from the past and present values of Y alone. Granger causality in rime series analysis Counter-example: Nicotine-stained fingers as a predictive cause of lung cancer

Types of causality: Counterfactual (potential outcomes)
Counterfactual causality: Causes make their effects different from what they otherwise would have been. X is a cause of Y if Y would not have occurred had X not occurred first. Widely used in modern epidemiology; also used in econometrics Challenges: Requires untestable assumptions about counterfactual worlds (what would have been, not what was) Sensitive to modeling assumptions

Types of causality: Probabilistic
Probabilistic causality: Causes make their effects more likely. X is a cause of Y if the occurrence of X increases the probability of occurrence of Y. Most current approaches accept that causation is probabilistic Counterexample based on Bayes’ Rule: Test result does not cause disease, but can make it more probable. “Seeing” vs. “Doing” (Pearl)

Types of causality: Ordering
Computational causality: Information and determination flow from causes to their effects X is a cause of Y if the value of Y must be computed from the value of X in all valid simulation models Simon-Iwasaki causal ordering, Related to exogeneity in econometrics

Types of causality: Manipulative
Manipulative causality: Changing causes changes their effects (or effect probabilities) X is a (manipulative) cause of Y if changing X changes Y Structural equations models Y = f(X) means that changing X will cause Y to change to restore equality Of key interest to decision-makers Not implied by regularity, associational, counterfactual, or predictive causality Often conflated with these other kinds of causality, e.g., in public health

Types of causality Mechanistic/explanatory causality: Causes help to explain their effects, and changes in causes help to explain changes in their effects X is a cause of Y if a path of law-like causal mechanisms propagates changes in X to changes in Y Simulation modeling: X affects inflows or outflows to Y.

Math 6330: Statistical Consulting Class 5

Similar presentations

Presentation on theme: "Math 6330: Statistical Consulting Class 5"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Math 6330: Statistical Consulting Class 5

Similar presentations

Presentation on theme: "Math 6330: Statistical Consulting Class 5"— Presentation transcript:

Similar presentations

About project

Feedback