© 2015 Carnegie Mellon University COCOMO 2015 November 17, 2015 Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal.

Slides:



Advertisements
Similar presentations
Carnegie Mellon University Software Engineering Institute CERT® Knowledgebase Copyright © 1997 Carnegie Mellon University VU#14202 UNIX rlogin with stack.
Advertisements

Design of Experiments Lecture I
© 2013 Carnegie Mellon University UFO: From Underapproximations to Overapproximations and Back! Arie Gurfinkel (SEI/CMU) with Aws Albarghouthi and Marsha.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
© 2014 Microsoft Corporation. All rights reserved.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
© 2011 Carnegie Mellon University System of Systems V&V John B. Goodenough October 19, 2011.
© 2010 Carnegie Mellon University B OXES : A Symbolic Abstract Domain of Boxes Arie Gurfinkel and Sagar Chaki Software Engineering Institute Carnegie Mellon.
DTIC Overview Mr. Al Astley Defense Technical Information Center Director, Information Science & Technology June 3, 2010 Approved for Public Release U.S.
© 2013 Carnegie Mellon University Academy for Software Engineering Education and Training, 2013 Session Architect: Tony Cowling Session Chair: Nancy Mead.
2013 COCOMO Forum Stoddard, 24 October 2013 © 2013 Carnegie Mellon University Harvesting Reference Points for Cost Estimation: A Step in the SEI’s Cost.
© 2013 Carnegie Mellon University Measuring Assurance Case Confidence using Baconian Probabilities Charles B. Weinstock John B. Goodenough Ari Z. Klein.
© Carnegie Mellon University The CERT Insider Threat Center.
© 2010 Carnegie Mellon University Acquisition Implications of SOA Adoption Software Engineering Institute Carnegie Mellon University Pittsburgh, PA
Outline 1) Objectives 2) Model representation 3) Assumptions 4) Data type requirement 5) Steps for solving problem 6) A hypothetical example Path Analysis.
© 2015 Carnegie Mellon University Property Directed Polyhedral Abstraction Nikolaj Bjørner and Arie Gurfinkel VMCAI 2015.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Chapter 12 Simple Regression
© 2011 Carnegie Mellon University Should-Cost: A Use for Parametric Estimates Additional uses for estimation tools Presenters:Bob Ferguson (SEMA) Date:November.
Specifying a Purpose, Research Questions or Hypothesis
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
© 2011 Carnegie Mellon University QUELCE: Quantifying Uncertainty in Early Lifecycle Cost Estimation Presenters:Dave Zubrow PhD Bob Ferguson (SEMA) Date:November.
© 2015 Carnegie Mellon University Software Engineering Institute Carnegie Mellon University Pittsburgh, PA A Cognitive Study of Incident Handling.
Jul The New Geant4 License J. Perl The New Geant4 License Makes clear the user’s wide- ranging freedom to use, extend or redistribute Geant4, even.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Ipek Ozkaya, COCOMO Forum © 2012 Carnegie Mellon University Affordability and the Value of Architecting Ipek Ozkaya Research, Technology.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Conditions and Terms of Use
Understanding Statistics
1 Department of Electrical and Computer Engineering University of Virginia Software Quality & Safety Assessment Using Bayesian Belief Networks Joanne Bechta.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Lesson Title: Guidelines for Securing RFID Systems Dale R. Thompson Computer Science and Computer Engineering Dept. University of Arkansas
Primer Briefing “Brand Name or Equal” Purchase Descriptions Ask a Professor - # Date:
LECTURE 5 HYPOTHESIS TESTING EPSY 640 Texas A&M University.
Quantitative Techniques. QUANTITATIVE RESEARCH TECHNIQUES Quantitative Research Techniques are used to quantify the size, distribution, and association.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Research Design. Selecting the Appropriate Research Design A research design is basically a plan or strategy for conducting one’s research. It serves.
National Alliance for Medical Image Computing Licensing in NAMIC 3 requirements from NCBC RFA (paraphrased)
Author Software Engineering Institute
1 Copyright © 2011 by Saunders, an imprint of Elsevier Inc. Chapter 7 Understanding Theory and Research Frameworks.
© 2015 Carnegie Mellon University Parametric Symbolic Reachability Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie.
COCOMO Forum 2015 Bob Ferguson © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Making.
Government Contract Law – Post Award Shraddha Upadhyaya Contract Law Division U.S. Department of Commerce Office of General Counsel GSA Training Conference.
Pittsburgh, PA CMMI Acquisition Module - Page M5-1 CMMI ® Sponsored by the U.S. Department of Defense © 2005 by Carnegie Mellon University This.
Lesson Title: Animal Identification Standards Dale R. Thompson Computer Science and Computer Engineering Dept. University of Arkansas
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
Variable selection in Regression modelling Simon Thornley.
October 20-23rd, 2015 Sandboxing and Reasoning on Malware Infection Trees Kris Ghosh 1, Jose Morales 2, Will Casey 2 and Bud Mishra 3 1. Miami University.
1 CERT BFF: From Start To PoC June 09, 2016 © 2016 Carnegie Mellon University This material has been approved for public release and unlimited distribution.
Data Science: What It Is and How It Can Help Your Company
Secure Software Workforce Development Panel Session
RaboDirect Financial Health Barometer 2016
David Svoboda & Aaron Ballman
Author Software Engineering Institute
Michael Spiegel, Esq Timothy Shimeall, Ph.D.
Statistical Data Analysis
Chapter 11 Simple Regression
Metrics-Focused Analysis of Network Flow Data
Reliability and Validity of Measurement
A Short Tutorial on Causal Network Modeling and Discovery
QUELCE: Quantifying Uncertainty in Early Lifecycle Cost Estimation
Dynamic Cyber Training with Moodle
Statistical Data Analysis
Seminar in Economics Econ. 470
Verifying Periodic Programs with Priority Inheritance Locks
Research Questions & Research Hypotheses
Presentation transcript:

© 2015 Carnegie Mellon University COCOMO 2015 November 17, 2015 Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs Bob StoddardSEMA Mike KonradSEMA

2 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Copyright 2015 Carnegie Mellon University This material is based upon work funded and supported by the Department of Defense under Contract No. FA C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Department of Defense. References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at Carnegie Mellon® is registered in the U.S. Patent and Trademark Office by Carnegie Mellon University. DM

3 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs 1 Why Causation instead of Correlation Causal Modeling using DAGs 2 Examples Call for Action and Collaboration Agenda 1 Cost Estimating Relationships 2 Directed Acyclic Graphs

4 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Many CERs are built using traditional correlation and statistical regression modeling However, serious concerns exist in using these methods for the development of CERs, namely: What if other factors not represented in the model are responsible for the cost effects? What if there are convoluted factors impacting cost? What if cost analysts decide to interpret the regression coefficients as the degree of influence on cost? How do cost analysts confidently know that the CER parameters influence cost as compared to other factors that are correlated with these parameters?

5 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda

6 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Why Traditional Correlation Falls Short Los Angeles Times May 12, correlation-is-not-causation column.html

7 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Why Causal Modeling is a Game Changer 2) Without controlled experimentation, how do you conclude true causes of cost? 4) What if you could conclude causal effects on cost using non- experimental data (aka observational data)? 5) Would this enhance your development of CERs and cost estimates? 1) How many CERs are built on definitive causal influences of cost? 3) Would your CERs be more useful and credible if they were based on true causal influences on cost?

8 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal Modeling – Dr. Judea Pearl

9 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited “… I see no greater impediment to scientific progress than the prevailing practice of focusing all of our mathematical resources on probabilistic and statistical inferences while leaving causal considerations to the mercy of intuition and good judgment.” Pearl, J. (2009). Causality. Cambridge university press. (Preface to 1 st Edition) “The development of Bayesian Networks, so people tell me, marked a turning point in the way uncertainty is handled in computer systems. For me, this development was a stepping stone towards a more profound transition, from reasoning about beliefs to reasoning about causal and counterfactual relationships.” Judea Pearl: From Bayesian Networks to Causal and Counterfactual Reasoning Keynote Lecture at the 2014 BayesiaLab User Conference Recorded on September 24, 2014, in Los Angeles. Quotes by Judea Pearl

10 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal Modeling – Dr. Stephen Morgan

11 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited CMU Causal Modeling Researchers-01

12 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited CMU Causal Modeling Researchers-02

13 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 2-Day Seminar offered by Dr. Felix Elwert, Univ of Wisconsin Available through two channels: Statistical Horizons BayesiaLab course-fairfax course-fairfax Causal Inference with Directed Graphs Training

14 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda

15 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Landscape of Causal Modeling Raw Observational Data Statistical Discovery of Causal Relationships To create the DAG (CMU Faculty) Quantifying Causal Relations using DAG graph surgery and Instrumental Variables (Pearl & Elwert) Identity of true causal parameters of cost

16 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Derive testable implications of a causal model to evaluate if the model is correct 2. Understand causal identification requirements to confirm whether causality may be extracted from the data Separating causal from spurious associations in the data 3. Inform use of traditional statistical techniques such as regression Deciding which control variables to include versus not to include in the analysis to achieve identification of causality Use of Directed, Acyclic Graphs

17 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. DAGs consist of: a) nodes (variables), b) directed arrows (possible causal relationships ordered by time), and c) missing arrows (confident assumptions about absence of causal effects 2. DAGs are nonparametric a) No distributional assumptions b) Linear and/or nonlinear 3. DAGs have both causal paths and non-causal (spurious) paths Basic Concepts of DAGs

18 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Indirect Connection 2. Common Cause 3. Common Effect (Collider) Three Structures Studied in a DAG

19 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Uses a technique called d-Separation a) Algorithm to help determine which paths are causal versus non- causal b) Uses concept of blocking a path to stop transmission of non- causal association 2. Additional techniques employed include a) Graphical identification b) Adjustment Criterion c) Backdoor Criterion d) Frontdoor Criterion e) Pearl’s do-Calculus Deriving Testable Implications of a DAG

20 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited 1. Controlling a variable 2. Stratifying a variable 3. Setting evidence on a variable 4. Observing a variable 5. Matching a variable (eg making distributions of sub-populations as similar as possible for comparison) Blocking or Adjusting Paths

21 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda

22 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Excerpts taken from: Example: Causality Modeling with BayesiaLab

23 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

24 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

25 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

26 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

27 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

28 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

29 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

30 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

31 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

32 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

33 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited

34 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Use the CMU tool, Tetrad, to discover causal parameters in a data set containing a wide variety of factors deemed relevant to cost, or Hypothesize a set of factors related to cost, along with their hypothesized interrelationships, followed by causal modeling using Pearl graph surgery or instrumental variable analysis using Stata Factors may relate to existing cost parameters as well as factors related to new or emergent cost influences, such as Agile and DevOps Cost Estimation Example

35 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Problem of Developing CERs Why Causation instead of Correlation Causal Modeling using DAGs Examples Call for Action and Collaboration Agenda

36 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Causal modeling with observational data is practical Causal modeling informs which variables to include in experimental research You should consider building causal methodology into your CER development Practical methods and tooling now exist to discover (Tetrad) and model (Tetrad, Stata) causal relationships in data We (SEI) seek to partner with you in developing CERs by applying causal methods to your data Call for Action and Collaboration

37 Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs (November 17, 2015) © 2015 Carnegie Mellon University Distribution Statement A: Approved for Public Release; Distribution is Unlimited Contact Information Points of Contact SEMA Cost Estimation Research Group Robert Stoddard Mike Konrad U.S. Mail Software Engineering Institute Customer Relations 4500 Fifth Avenue Pittsburgh, PA , USA Web Customer Relations Telephone: SEI Phone: SEI Fax: