Paul Bakker – Social Impact Squared

Slides:



Advertisements
Similar presentations
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Advertisements

Advantages and limitations of non- and quasi-experimental methods Module 2.2.
Design of Experiments Lecture I
OECD/INFE High-level Principles for the evaluation of financial education programmes Adele Atkinson, PhD OECD With the support of the Russian/World Bank/OECD.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Other Quasi-Experimental Designs. Design Variations Show specific design features that can be used to address specific threats or constraints in the context.
Testing Theories: Three Reasons Why Data Might not Match the Theory.
DOCTORAL SEMINAR, SPRING SEMESTER 2007 Experimental Design & Analysis Analysis of Covariance; Within- Subject Designs March 13, 2007.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Designing Influential Evaluations Session 5 Quality of Evidence Uganda Evaluation Week - Pre-Conference Workshop 19 th and 20 th May 2014.
Analyzing Measurement Data ENGR 1181 Class 8. Analyzing Measurement Data in the Real World As previously mentioned, data is collected all of the time,
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
Chapter 10 Simple Regression.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
Chapter 12 Simple Regression
RELIABILITY consistency or reproducibility of a test score (or measurement)
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
TOOLS OF POSITIVE ANALYSIS
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Demand Forecasts The three principles of all forecasting techniques: –Forecasting is always wrong –Every forecast should include an estimate of error –The.
Introduction to Regression Analysis, Chapter 13,
© 2013 Cengage Learning. Outline  Types of Cross-Cultural Research  Method validation studies  Indigenous cultural studies  Cross-cultural comparisons.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
LSS Black Belt Training Forecasting. Forecasting Models Forecasting Techniques Qualitative Models Delphi Method Jury of Executive Opinion Sales Force.
Identifying Input Distributions 1. Fit Distribution to Historical Data 2. Forecast Future Performance and Uncertainty ◦ Assume Distribution Shape and Forecast.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EVALUATION APPROACHES Heather Aquilina 24 March 2015.
Measuring Complex Achievement
UNDERSTANDING RISK AND RETURN CHAPTER TWO Practical Investment Management Robert A. Strong.
Experimental Design If a process is in statistical control but has poor capability it will often be necessary to reduce variability. Experimental design.
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Decision making Under Risk & Uncertainty. PAWAN MADUSHANKA MADUSHAN WIJEMANNA.
QM Spring 2002 Business Statistics Probability Distributions.
Methods- Chapter 1. I. Why is Psychology a science?  Deals with experiments and scientific method.
Measurement Issues General steps –Determine concept –Decide best way to measure –What indicators are available –Select intermediate, alternate or indirect.
Marketing THE ACTIVITY, SET OF INSTRUCTIONS, AND PROCESS FOR CREATING, COMMUNICATING, DELIVERING, AND EXCHANGING OFFERINGS THAT HAVE VALUE FOR CUSTOMERS,
3-1 Copyright © 2010 Pearson Education, Inc. Chapter Three Research Design.
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Randomized Assignment Difference-in-Differences
ANCOVA.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
Bringing Diversity into Impact Evaluation: Towards a Broadened View of Design and Methods for Impact Evaluation Sanjeev Sridharan.
Evidence-Based Mental Health PSYC 377. Structure of the Presentation 1. Describe EBP issues 2. Categorize EBP issues 3. Assess the quality of ‘evidence’
Definition Slides Unit 2: Scientific Research Methods.
Definition Slides Unit 1.2 Research Methods Terms.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
© 2013 by Nelson Education1 Foundations of Recruitment and Selection I: Reliability and Validity.
Methods of Presenting and Interpreting Information Class 9.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Chapter 3: Cost Estimation Techniques
Measuring Results and Impact Evaluation: From Promises into Evidence
Lecture 8 Preview: Interval Estimates and Hypothesis Testing
Chapter 11 Simple Regression
Applied Statistical Analysis
12 Inferential Analysis.
Supporting an omnichannel strategy Measuring performance
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
مدل زنجیره ای در برنامه های سلامت
Gerald Dyer, Jr., MPH October 20, 2016
12 Inferential Analysis.
Presentation transcript:

Paul Bakker – Social Impact Squared Differences in Using Common Metrics to Measure Business vs. Social Performance Paul Bakker – Social Impact Squared SIAA – Canada Launch Sept. 13, 2013

Profit Bottom Line Only (Traditional Business) Theories of Change Profit Bottom Line Only (Traditional Business) Social Bottom Line Logic Model Investment for Cost of Production Inputs Income (Donations/Grants) Manufacturing, Marketing, Sales, etc Distribution of Needed Goods, Social Services Activities Recognize that this is a simplification, but I think the main points apply. Now let’s explore the similarities and difference between a profit logic model and a social change logic model: The Key differences relate to Outputs and Outcomes However, the goal of making profit stops at the provision of outputs. Whether the products or services have positive effects on society are only a concern if negative results will discourage future sales. Traditional business theory assumes that a business endeavour has positive social value if people are willing to pay for it. This assumption freed traditional business from having to pay much attention to or measure their social impact. However, we can think of many examples of highly purchased products that have negative or questionable social consequences (i.e. tobacco, certain t.v. programming, guns, etc.) When you are trying to achieve social change, outputs are only important if they lead to outcomes; that is, positive changes in people’s lives. So, the social change logic model requires us to pay attention to and measure outcomes rather than just outputs. Another important difference is that goal of the traditional business model is always the same (profit), which facilitates benchmarking against other businesses. For social change efforts, the goal can take any number of forms, making benchmarking more difficult. # of Goods and Services Sold (Income, Profit) Outputs # of people served Goal Positive Change in People’s Lives Goal Assumed Positive Outcomes

Measuring Outputs vs. Outcomes Directly observable Count how many people bought your products or services. Record how much they cost you to produce. Record how much people paid for them. Outcomes Not directly observable Hard to observe changes in ideas, attitudes, and beliefs Often lose contact with clients Observed changes might not be caused by you Why can’t business just apply business’ knowledge of measuring outputs to measuring outcomes, because, unlike outputs, outcomes are not directly observable. It is harder to observe what is going on inside people’s head, although market researchers have these skills. Social programs often lose contact with clients after their programs are over. Consider a program trying to prevent at-risk youth from re-engaging with the justice system. These youth often want their actions to remain unobserved and their families may move frequently, and privacy of information makes it hard to obtain data from institutions. All of this makes it hard to observe what happened to youth after they left the program. Even if you are able to observe what changed in clients’ lives, it is hard to attribute those changes to your actions. Consider the youth crime prevention example: Did the youth stop engaging in criminal activity because of your program, a change in schools, a change in parenting, other community programs, etc. The key to figuring out how much of the observed change is attributable to you is to estimate what would have occurred if you didn’t provide your goods or services. In technical terms, this is called the counterfactual. I don’t think traditional business performance measurement has paid much attention to establishing the counterfactual.

How to Describe Alternative Reality (i.e. the counterfactual) It’s all about making comparisons: To similar groups that didn’t receive your goods/services (equivalent or nonequivalent comparison groups) To the past (pre-post, longitudinal, time-series) To groups that are different in known ways (cut-off score designs, regression point displacement) To what statistics predict would have happened To what experts think would have happened (including clients and other stakeholders) So the additional challenge of the social impact analysts is to somehow come up with an accurate description of what would have occurred in an alternative dimensions where clients did not receive your goods or services There are many different ways to estimate the counterfactual. We don’t have time to review all of them in detail. For now, it is important to highlight that the likelihood of obtaining accurate estimates varies by methods. From my observations, the field of evaluation (at least in Canada) has come to a general consensus that the best methods depend on the context of the program and focus of the evaluation. Examples if needed Let’s consider our example of a program that is trying to prevent at-risk youth from engaging in the justices system. Let’s also say that many of the youth are already involved in criminal activity. What comparison should we use? Randomization – morally opposed, contamination, youth might not want to show if friends can’t come, poor generalizibility. Comparison group – Difficult to find other crime involved youth that you aren’t giving service to, that will be willing to let you collect data from them, and that are meaningfully different from the group in your program. Pre-post: not meaningful in this case, you are trying to prevent something from happening, not change an existing characteristics. Statistical predictions: maybe, but it is very hard to create tools that can accurately predict people’s behaviour. RD: Maybe, would need hundreds of youth in sample. Selection would have to be based on cut-off. What if the program costs $100,000 to run, do you want to spend more than that to measure it’s impact? Another example: Research has shown that the literacy and essential skills of those in training programs often can improve in the short-term, but not in comparison to similar people not in the training program (as their skills tended to improve too). What did improve was literacy habit like reading books more often, which after many year’s lead to better outcomes than those that did not take the training. But do we want to require every literacy training organization to undertake an expensive multi-year longitudinal study of their social impact? After a few strong studies, I would suggest that weak designs would give enough confidence that things are working.

The State of Social Impact Measurement There are different ways to attribute observed change to social programs, some of which are complex and expensive. The most appropriate methods depend on the context of the program. The most frequently used methods provide estimates that have a good chance of being off the mark. So what methods to estimate the counterfactual are typically being used. Most frequent: The 2012 State of Evaluation found that 65% of nonprofit organization survey said they used before and after measures, while only 6% said they used quasi-experimental designs or control groups, and only 4% said they used randomized control trials. My description of the state of social impact measurement doesn’t sound so good does it? I want to assure you that I believe that social impact analysis can have tremendous value, but it does have it’s limitations, and we need to be aware of them.

Social Impact Common Measurement Systems Some want to be able to assess performance of social investments like they assess the performance of financial investments. To date, developed good activity & output metrics that are standardized and comparable across programs. But, are challenged to incorporate outcome measurement. Examples of Social Impact Common Measurement Systems: Impact Reporting and Investment Standards Global Impact Investing Rating System Charity Navigator Now, let’s change focus a bit and review efforts to create common measurement system for social impact. I am not going to over all these systems in detail right now. Maybe we can explore the details during the discussion that we’ll have soon. For now, here are the points I want to make. The tools of financial performance measurement serve us well when developing systems to compare activities and outputs, but what really matters is outcomes, and each of these systems could do a better job of incorporating outcome measurement into their systems. Part of the reason the systems currently don’t have a strong focus on outcome measurement, is that outcome measurement data in any form is not consistently available throughout the system. The different systems deal with the challenge of outcome measurement data in different ways, and some are taking action to encourage social impact reporting through the social sector.

Comparability of Outcome Metrics Even if social outcome data was readily available, it wouldn’t be fully comparable. Differences in attribution methods (counterfactual estimates). Measurement tools like surveys have different degrees of error and biases can vary by cultural groups and contexts. The systems do not prescribe study designs; as they shouldn’t. Even if used the same methods, results can still not be fully comparable. For example, social desirability biases in surveys have been found to be higher in more collectivistic cultures. So, if surveying is used to estimate social impact, programs serving different ethnic groups might look like they have different levels of social impact, when in fact they don’t.

Where Do We Go From Here Giving that outcome measures are estimates and never fully comparable, what should we do with systems design to compare social performance? Abandon them? Nothing, they are fine the way they are? Improve them? How? Let’s Discuss! Nothing – am I making a mountain out of a mole hill? My position is we have to improve them.

Recognizing Uncertainty Program 1 Program 2 SROI of 1:16 or GIIRS score of 140 Average Level of Evidence of Effective of Program Components: 6 SROI of 1:4 or GIIRS score of 95 Average Level of Evidence of Effective of Program Components: 2.75 Levels of Evidence 1 Systematic review of randomized controlled trials 2 At least one randomized controlled trial 3 Multiple well-designed quasi-experimental study 4 At least one well-designed quasi-experimental study 5 Descriptive studies, such as correlation studies 6 Stakeholder/expert opinion 7 No good evidence There are different approaches that we can take, but if we want to develop performance dashboards like those used in the financial sector, we need to incorporate measurement error into the dashboard. There needs to be a uncertainty metric. Here is an example of what it could look like if we recognized uncertainty in common performance systems for social impact This will help: Risk-taking or cautious investors make investing decisions. Discourage use of weak attribution methods that are more likely to provide over-estimates of impact At the same time, will allow for risk-taking investors to invest in innovative and promising approaches that have yet to be tested with more rigorous impact measurement approaches. What do you think of this approach?