Math 6330: Statistical Consulting Class 8

Math 6330: Statistical Consulting Class 8
Tony Cox University of Colorado at Denver Course web site:

Agenda Projects and schedule Prescriptive (decision) analytics (Cont.)
Decision trees Simulation-optimization Newsvendor problem and applications Decision rules, optimal statistical decisions Quality control, SPRT Evaluation analytics Learning analytics Decision psychology Heuristics and biases

Recommended readings Charniak (1991) (rest of paper)
Build the network in Figure 2 Pearl (2009) Methods to Accelerate the Learning of Bayesian Network Structures, Daly and Shen (2007) Distinguishing cause from effect using observational data (Mooij et al., 2016), Probabilistic computational causal discovery for systems biology (Lagani et al., 2016)

Projects

Papers and projects: 3 types
Applied: Analyze an application (description, prediction, causal analysis, decision, evaluation, learning) using high-value statistical consulting methods Research/develop software R packages, algorithms, CAT modules, etc. Research/review book or papers (3-5 articles) Explain a topic within statistical consulting Examples: Netica’s Bayesian inference algorithms, multicriteria decision-making, machine learning algorithms, etc.

Projects (cont.) Typical report paper is about pages, font 12, space (This is typical, not required) Content matters; length does not Typical in-class presentation is minutes Can run longer if needed Purposes: Learn something interesting and useful; Either explain/show what you learned, or show how to use it in practice (or both)

Project proposals due March 17
If you have not yet done so, please send me a succinct description of what you want to do (and perhaps what you hope to learn by doing it). Problem to be addressed Methods to be researched/applied Hoped-for results Due by end of day on Friday, March 17th (though sooner is welcome) Key dates: April 14 for rough draft (or very good outline) Start in-class presentations/discussions April 18 May 4, 8:00 PM for final

Course schedule March 14: No class. (Work on project idea)
March 17: Project/paper proposals due March 21: No class (Spring break) April 14: Draft of project/term paper due April 18, 25, May 2, (May 9): In-class presentations May 4: Final project/paper due by 8:00 PM

Prescriptive analytics (cont.)

Algorithms for optimizing actions
Decision analysis framework: Choose act a from choice set A to maximize expected utility of consequence c, given a causal model c(a, s), Pr(s) or Pr(c | a, s), Pr(s) s = state = random variable = things that affect c other than the choice of act a Influence diagram algorithms Learning ID structure from data Validating causal mechanisms Using for inference and recommendations Simulation-optimization Robust optimization Adaptive optimization/learning algorithms

Prescriptive analytics methods
Optimization Decision trees, Stochastic dynamic programming, optimal control Gittins indices Reinforcement learning (RL) algorithms Influence diagram solution algorithms Simulation-optimization Adaptive learning and optimization EVOP (Evolutionary operations) Multi-arm bandit problems, UCL strategies

Decision tree ingredients
Three types of nodes Choice nodes (squares) Chance nodes (circles) Terminal nodes / value nodes Arcs show how decisions and chance events can unfold over time Uncertainties are resolved as time passes and choices are made

Solving decision trees
“Backward induction” “Stochastic dynamic programming” “Average out and roll back”  implicitly, tree determines Pr(c | a) Procedure: Start at tips of tree, work backward Compute expected value at each chance node “Averaging out” Choose maximum expected value at each choice node

Obtaining Pr(s) from Decision trees http://www. eogogics
Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000)

What happened to act a and state s. http://www. eogogics
Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000)

What are the 3 possible acts in this tree?

What are the 3 possible acts in this tree? (a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful.

Optimize decisions! What are the 3 possible acts in this tree? (a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful.

Key points Solving decision trees (with decisions) requires embedded optimization Make future decisions optimally, given the information available when they are made Event trees = decision trees with no decisions Can be solved, to find outcome probabilities, by forward Monte-Carlo simulation, or by multiplication and addition In general, sequential decision-making cannot be modeled well using event trees. Must include (optimal choice | information)

What happened to state s. http://www. eogogics
What are the 4 possible states?

What happened to state s. http://www. eogogics
What are the 4 possible states? C1 can succeed or not; C2 can be high or low demand

Acts and states cause consequences http://www. eogogics

Key theoretical insight
A complex decision model can be viewed as a (possibly large) simple Pr(c | a) model. s = selection of branch at each chance node a = selection of branch at each choice node c = outcome at terminal node for (a, s) Pr(c | a) = sPr(c | a, s)*Pr(s) Other complex decision models can also be interpreted as c(a, s), Pr(c | a, s), or Pr(c |a) models s = system state & information signal a = decision rule (information  act) c may include changes in s and in possible a.

Real decision trees can quickly become “bushy messes” (Raiffa, 1968) with many duplicated sub-trees

Influence Diagrams help to avoid large trees http://en. wikipedia
Often much more compact than decision trees

Limitations of decision trees
Combinatorial explosion Example: Searching for a prize in one of N boxes or locations involves building a tree of depth N! = N(N – 1)…*2*1. Infinite trees Continuous variables When to stop growing a tree? How to evaluate utilities and probabilities?

Optimization formulations of decision problems
Example: Prize is in location j with prior probability p(j), j = 1, 2, …, N It costs c(j) to inspect location j What search strategy minimizes expected cost of finding prize? What is a strategy? Order in which to inspect How many are there? N!

With two locations, 1 and 2 Strategy 1: Inspect 1, then 2 if needed:
Expected cost: c1 + (1 – p1)c2 = c1 + c2 – p1c2 Strategy 2: Inspect 2, then 1 if needed: Expected cost: c2 + (1 – p2)c1 = c1 + c2 – p2c1 Strategy 1 has lower expected cost if: p1c2 > p2c1, or p1/c1 > p2/c2 So, look first at location with highest success probability per unit cost

With N locations Optimal decision rule: Always inspect next the (as-yet uninspected) location with the greatest success probability-to-cost ratio Example of an “index policy,” “Gittins index” If M players take turns, competing to find prize, each should still use this rule. A decision table or tree can be unwieldy even for such simple optimization problems

Other optimization formulations
maxa A EU(a) Typically, a is a vector, A is the feasible set More generally, a is a strategy/policy/decision rule, A is the choice set of feasible strategies In previous example, A = set of permutations s.t. EU(a) = ∑cPr(c | a)u(c) Pr(c | a) = ∑sPr(c | a, s)p(s) g(a) ≤ 0 (feasible set, A)

Introduction to evaluation analytics

Evaluation analytics: How well are policies working?
Algorithms for evaluating effects of actions, events, conditions Intervention analysis/interrupted time series Key idea: Compare predicted outcomes with no action to observed outcomes with it Counterfactual causal analysis Google’s new CausalImpact algorithm Quasi-experimental designs and analysis Refute non-causal explanations for data Compare to control groups to estimate effects

How did U.K. National Institute for Health and Clinical Excellence (NICE) recommendation of complete cessation of antibiotic prophylaxis for prevention of infective endocarditis in March, 2008 affect incidence of infective endocarditis?

Different models yield different conclusions
Different models yield different conclusions. So, how to deal with model uncertainty? Solution: Model ensembles, Bayesian Model Averaging (BMA)

Nonlinear models complicate inference of intervention effects
Solution: Non-parametric models, gradient boosting

Quasi-experiments: Refuting non-causal explanations with control groups
Example: Do delinquency interventions work?

Algorithms for evaluating effects of combinations of factors
Classification trees Boosted trees, Random Forest, MARS Bayesian Network algorithms Discovery Conditional independence tests Validation Inference and explanation Response surface algorithms Adaptive learning, design of experiments

Learning analytics Learn to predict better
Create ensemble of models, algorithms Use multiple machine learning algorithms Logistic regression, Random Forest, SVM, ANN, deep learning, gradient boosting, KNN, lasso, etc. “Stack” models (hybridize multiple predictions) Cross-validation assesses model performance Meta-learner combines performance-weighted predictors to produce an improved predictor Theoretical guarantees, practical successes (Kaggle competitions) Learn to decide better Low-regret learning of decision rules Theoretical guarantees (MDPs) practical performance

Collaborative risk analytics: Multiple interacting learning agents

Collaborative risk analytics
Global performance metrics Local information, control, tasks, priorities, rewards Hierarchical distributed control Collaborative sensing, filtering, deliberation, and decision-control networks of agents Mixed human and machine agents Autonomous agents vs. intelligent assistants

Collaborative risk analytics: Games as labs for distributed AI
Local information, control, tasks, priorities Hierarchical distributed control Collaborative sensing, deliberation, control networks From decentralized agents to effective risk analytics teams and HCI support Trust, reputation, performance Sharing information, attention, control, evaluation, learning

Risk analytics toolkit: Summary
Descriptive analytics Change-point analysis, likelihood ratio CPA Machine learning , response surfaces ML (LR, RF, GBM, SVM, ANN, KNN, etc.) Predictive analytics Bayesian networks, dynamic BN BN, DBN Bayesian model averaging BMA, ML Causal analytics & principles Causal BNs, systems dynamics (SD) DAGs, SD simulation Time series causation Prescriptive analytics: IDs, simulation-optimization, robust Evaluation analytics: QE, credit assignment, attribution Learning analytics Machine learning, superlearning ML Low-regret learning of decision rules Collaborative learning

Applied risk analytics toolkit: Toward more practical analytics
Reorientation: From solving well-posed problems to discovering how to act more effectively Descriptive analytics: What’s happening? Predictive analytics: What’s coming next? Causal analytics: What can we do about it? Prescriptive analytics: What should we do? Evaluation analytics: How well is it working? Learning analytics: How to do better? Collaboration: How to do better together?

Math 6330: Statistical Consulting Class 8

Similar presentations

Presentation on theme: "Math 6330: Statistical Consulting Class 8"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Math 6330: Statistical Consulting Class 8

Similar presentations

Presentation on theme: "Math 6330: Statistical Consulting Class 8"— Presentation transcript:

Similar presentations

About project

Feedback