© 2015 Carnegie Mellon University COCOMO 2015 November 17, 2015 Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs Bob StoddardSEMA Mike KonradSEMA
2 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Copyright 2015 Carnegie Mellon University This material is based upon work funded and supported by the Department of Defense under Contract No. FA C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Department of Defense. References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University. DM
3 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs 1 Why Causation instead of Correlation Causal Modeling using DAGs 2 Examples Call for Action and Collaboration Agenda 1 Cost Estimating Relationships 2 Directed Acyclic Graphs
4 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Many CERs are built using traditional correlation and statistical regression modeling However, serious concerns exist in using these methods for the development of CERs, namely: What if other factors not represented in the model are responsible for the cost effects? What if there are convoluted factors impacting cost? What if cost analysts decide to interpret the regression coefficients as the degree of influence on cost? How do cost analysts confidently know that the CER parameters influence cost as compared to other factors that are correlated with these parameters?
5 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda
6 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Why Traditional Correlation Falls Short Los Angeles Times May 12, correlation-is-not-causation column.html
7 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Why Causal Modeling is a Game Changer 2) Without controlled experimentation, how do you conclude true causes of cost? 4) What if you could conclude causal effects on cost using non- experimental data (aka observational data)? 5) Would this enhance your development of CERs and cost estimates? 1) How many CERs are built on definitive causal influences of cost? 3) Would your CERs be more useful and credible if they were based on true causal influences on cost?
8 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal Modeling – Dr. Judea Pearl
9 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited “… I see no greater impediment to scientific progress than the prevailing practice of focusing all of our mathematical resources on probabilistic and statistical inferences while leaving causal considerations to the mercy of intuition and good judgment.” Pearl, J. (2009). Causality. Cambridge university press. (Preface to 1 st Edition) “The development of Bayesian Networks, so people tell me, marked a turning point in the way uncertainty is handled in computer systems. For me, this development was a stepping stone towards a more profound transition, from reasoning about beliefs to reasoning about causal and counterfactual relationships.” Judea Pearl: From Bayesian Networks to Causal and Counterfactual Reasoning Keynote Lecture at the 2014 BayesiaLab User Conference Recorded on September 24, 2014, in Los Angeles. Quotes by Judea Pearl
10 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal Modeling – Dr. Stephen Morgan
11 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited CMU Causal Modeling Researchers-01
12 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited CMU Causal Modeling Researchers-02
13 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 2-Day Seminar offered by Dr. Felix Elwert, Univ of Wisconsin Available through two channels: Statistical Horizons BayesiaLab course-fairfax course-fairfax Causal Inference with Directed Graphs Training
14 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda
15 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Landscape of Causal Modeling Raw Observational Data Statistical Discovery of Causal Relationships To create the DAG (CMU Faculty) Quantifying Causal Relations using DAG graph surgery and Instrumental Variables (Pearl & Elwert) Identity of true causal parameters of cost
16 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Derive testable implications of a causal model to evaluate if the model is correct 2. Understand causal identification requirements to confirm whether causality may be extracted from the data Separating causal from spurious associations in the data 3. Inform use of traditional statistical techniques such as regression Deciding which control variables to include versus not to include in the analysis to achieve identification of causality Use of Directed, Acyclic Graphs
17 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. DAGs consist of: a) nodes (variables), b) directed arrows (possible causal relationships ordered by time), and c) missing arrows (confident assumptions about absence of causal effects 2. DAGs are nonparametric a) No distributional assumptions b) Linear and/or nonlinear 3. DAGs have both causal paths and non-causal (spurious) paths Basic Concepts of DAGs
18 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Indirect Connection 2. Common Cause 3. Common Effect (Collider) Three Structures Studied in a DAG
19 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Uses a technique called d-Separation a) Algorithm to help determine which paths are causal versus non- causal b) Uses concept of blocking a path to stop transmission of non- causal association 2. Additional techniques employed include a) Graphical identification b) Adjustment Criterion c) Backdoor Criterion d) Frontdoor Criterion e) Pearl’s do-Calculus Deriving Testable Implications of a DAG
20 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Controlling a variable 2. Stratifying a variable 3. Setting evidence on a variable 4. Observing a variable 5. Matching a variable (eg making distributions of sub-populations as similar as possible for comparison) Blocking or Adjusting Paths
21 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda
22 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Excerpts taken from: Example: Causality Modeling with BayesiaLab
23 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
24 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
25 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
26 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
27 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
28 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
29 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
30 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
31 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
32 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
33 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited
34 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Use the CMU tool, Tetrad, to discover causal parameters in a data set containing a wide variety of factors deemed relevant to cost, or Hypothesize a set of factors related to cost, along with their hypothesized interrelationships, followed by causal modeling using Pearl graph surgery or instrumental variable analysis using Stata Factors may relate to existing cost parameters as well as factors related to new or emergent cost influences, such as Agile and DevOps Cost Estimation Example
35 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda
36 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal modeling with observational data is practical Causal modeling informs which variables to include in experimental research You should consider building causal methodology into your CER development Practical methods and tooling now exist to discover (Tetrad) and model (Tetrad, Stata) causal relationships in data We (SEI) seek to partner with you in developing CERs by applying causal methods to your data Call for Action and Collaboration
37 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Contact Information Points of Contact SEMA Cost Estimation Research Group Robert Stoddard Mike Konrad U.S. Mail Software Engineering Institute Customer Relations 4500 Fifth Avenue Pittsburgh, PA , USA Web Customer Relations Telephone: SEI Phone: SEI Fax: