Presentation is loading. Please wait.

Presentation is loading. Please wait.

Math 6330: Statistical Consulting Class 2

Similar presentations


Presentation on theme: "Math 6330: Statistical Consulting Class 2"— Presentation transcript:

1 Math 6330: Statistical Consulting Class 2
Tony Cox University of Colorado at Denver Course web site:

2 Student introductions
Name Affiliation (academic, professional) Technical interests Any special expertise in data analysis areas Any projects or data sets of interest Thoughts on Assignment 1? (How does PM2.5 affect elderly mortality in this data set?) Hopes and goals for course

3 Assignment # 1 Download data set Sample1.xlsx from Analyze the data to answer the following client question: “Is there evidence that high concentrations of fine particulate matter (PM2.5) increase daily elderly mortality counts (AllCause75)? If so, how large is the effect?” questions to

4 Assignment 2 (Due January 31)
Download data set Class2DataBenzene.xlsx from Setting: 3 factories in China, many workers, some measured more than once (“splits”). Some missing data. Detection limit for benzene in air in 0.2 ppm. Analyze the data to answer the following client question: “Is there evidence that low concentrations of benzene in air (e.g., AB < 1) produce disproportionately more toxic metabolites (PH, CA, HQ for phenol, catechol, hydroquinone) or total urinary metabolites (UB) than higher concentrations? What is the shape of the low-concentration relation between AB and each metabolite?” Background: Who cares, and why? questions to

5 Reminder: Goals for student projects
Extract good problems from available knowledge and data “Good” = high value of analysis = large improvement in decisions, results, etc. Apply high-value techniques to produce valuable answers and insights Unexpected directions are ok! Present the results so that the potential value is actually delivered If possible, document impact and next steps

6 Some high-value consulting tools – Beyond clustering and regression
Classification and regression trees (CART) Random Forest Bayesian networks Influence diagrams Predictive analytics Causal analytics State transition models Dynamic simulation modeling Markov Decision Processes (MDPs) Partially observable MDPs Simulation-optimization

7 Components of a successful project
Problem statement and motivation Data Analysis plan/narrative Tools and software Results: Reports and displays Presentation: What did we learn? Evaluation: What was the impact? Proposed next steps

8 High-value statistical consulting skills

9 Components of a consulting engagement
Agreed-to problem statement or question Understanding of why it matters, underlying goals, decisions, or questions Data that are relevant (maybe) for answering the question Methods: Tools, analyses, software Results and interpretation. Caveats/limitations Report to client (summarizes 1-5) Proposed next steps (usually) – Builds on 1-6

10 Key steps in consulting
Vision: Define and agree on success – goals and measures What you measure is what you get Clarify objectives Generate alternatives Compare/evaluate alternatives Make recommendations, show why Evaluate performance

11 Toward higher-value analytics
Reorientation: From solving well-posed problems to discovering how to act more effectively Descriptive analytics: What’s happening? Predictive analytics: What’s (probably) coming next? Causal analytics: What can we do about it? Prescriptive analytics: What should we do? Evaluation analytics: How well is it working? Learning analytics: How to do better? Collaboration: How to do better together?

12 High-value statistical skills
Describe current situation Predict what is likely to happen next if we do not take action Predict what is likely to happen next if we take different actions Optimize decisions about what to do Evaluate how well current policies are working Learn to improve current policies

13 Introduction to descriptive analytics

14 Descriptive analytics: What’s going on?
What is the current situation? Attribution: How much harm/loss/opportunity cost is being caused by X? Causes are often unobserved or uncertain What has changed recently? (Why?) Example: More extreme event reports caused by real change or by media? Change-point analysis (CPA) algorithms What should we worry about? How is this year’s season shaping up?

15 Air pollution example: Classification tree descriptive analytics
tmin, tmax, month, year, MAXRH are potential predictors of AllCause75 (elderly mortality) PM2.5 does not appear in this tree AllCause75 is conditionally independent of PM2.5 in this analysis, given the other variables in the tree Making year and month into categorical variables changes the tree but not this conclusion.

16 How a CART tree works Basic idea: Always ask the most informative question next, given answers so far. Questions are represented by splits in tree Leaf nodes show conditional means (or conditional distributions) of dependent variable Internal nodes show significance level for split: how significant are differences between conditional distributions Reduces prediction error for dependent variable Stop this “recursive partitioning” when further questions (splits in tree) do not significantly improve prediction. Classification & Regression Tree (CART) algorithm Some refinements: Grow a large tree and prune back to minimize cross-validation error fit multiple trees to random subsets of data and let them vote for best splits (“bagging”) over-train on mis-predicted cases (“boosting”) average predictions from many trees (“RandomForest” ensemble prediction) Join prediction “patches” together smoothly (MARS)

17 Bayesian Networks (BNs) show information relations among variables
BNs provides high-level roadmap for descriptive analytics Each node has a conditional probability table (CPT) (or regression model, CART tree, etc.) describing how the conditional probabilities of its values depend on other variables. If no arrow connects two variables, then they are conditionally independent of each other, given the other variables in the BN. Omitted variables can create statistical dependencies Conditioning on variables can also sometimes create dependencies Information principle for causality: Causes are not conditionally independent of their effects.


Download ppt "Math 6330: Statistical Consulting Class 2"

Similar presentations


Ads by Google