Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 3: Ethical Aspects of Data Analysis AUTHOR: KLAUS KELLER and LOUISE.

Slides:



Advertisements
Similar presentations
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Advertisements

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Hypothesis Testing Part I – As a Diagnostic Test.
Research Methods in MIS
International Auditing and Assurance Standards Board Accounting Estimates, Including Fair Value Accounting Estimates, and Related Disclosures ISA Implementation.
Computational Methods for Management and Economics Carla Gomes Module 3 OR Modeling Approach.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Auditing A Risk-Based Approach To Conducting A Quality Audit
Chapter One: The Science of Psychology
Choosing Statistical Procedures
Section 2: Science as a Process
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Copyright ©2011 Pearson Education
What’s in the news right now related to science???? Flesh eating bacteria.
Chapter One: The Science of Psychology. Ways to Acquire Knowledge Tenacity Tenacity Refers to the continued presentation of a particular bit of information.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Hypothesis Testing PowerPoint Prepared by Alfred.
1 f02kitchenham5 Preliminary Guidelines for Empirical Research in Software Engineering Barbara A. Kitchenham etal IEEE TSE Aug 02.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Issues concerning the interpretation of statistical significance tests.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
The Scientific Method. Objectives Explain how science is different from other forms of human endeavor. Identify the steps that make up scientific methods.
Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 6: Ethical issues implicit in cost- benefit analyses of climate management.
Research Methods Chapter 2.
Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 3: Ethical Aspects of Data Analysis AUTHOR: KLAUS KELLER and LOUISE MILITCH.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 5: Implementing Sustainable Development Programs AUTHOR: DONALD BROWN.
Building Valid, Credible & Appropriately Detailed Simulation Models
1 Life Cycle Assessment A product-oriented method for sustainability analysis UNEP LCA Training Kit Module k – Uncertainty in LCA.
Uncertain Judgements: Eliciting experts’ probabilities Anthony O’Hagan et al 2006 Review by Samu Mäntyniemi.
15 Inferential Statistics.
Uncertainty and controversy in environmental research
WHAT IS THE NATURE OF SCIENCE?
Features of science revision
Conceptual Change Theory
Ethics and Moral reasoning
Chapter Nine Hypothesis Testing.
Advanced Data Analytics
Chapter 9 -Hypothesis Testing
Logic of Hypothesis Testing
An Overview of Statistical Inference – Learning from Data
By Prof. Dr. Salahuddin Khan
Statistics in Clinical Trials: Key Concepts
Principles of Quantitative Research
Introduction to inference Use and abuse of tests; power and decision
The ISSAIs for Financial Audit ISSAIs
Section 2: Science as a Process
Research & Writing in CJ
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Research Process №5.
Rational Perspectives on Decision Making Keys to Decision Making
Hypothesis-Based Science
Hypothesis Testing: Hypotheses
An Overview of Statistical Inference – Learning from Data
Statistics for Business and Economics (13e)
More about Tests and Intervals
CONCEPTS OF HYPOTHESIS TESTING
1.5 – Scientific Inquiry.
What processes do scientists use when they perform scientific investigations? Chapter Introduction.
CHAPTER 9 Testing a Claim
OMGT LECTURE 10: Elements of Hypothesis Testing
Chapter 9: Significance Testing
Biological Science Applications in Agriculture
Meta-analysis, systematic reviews and research syntheses
Environmental forecasting
Presentation transcript:

Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 3: Ethical Aspects of Data Analysis AUTHOR: KLAUS KELLER and LOUISE MILITCH Department of Geosciences The Pennsylvania State University With input from Nancy Tuana, Ken Davis, Jim Shortle, Michelle Stickler, Don Brown, and Erich Schienke

Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance? The following guiding questions will provide you with a foundation for recognizing how ethical issues can arise in the interpretation of data. What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance? We will not cover all questions, rather, this is to help you to recognize the most outstanding issues.

What are potential ethical questions arising in data analysis? What are the impacts of potential errors in the data analysis result on the outcome of decision? Type I error Type II error Overconfident projections Biased projections How to deal with the illusion of objectivity? How to communicate potential overconfidence? How to formulate the null-hypothesis? When is the analysis “done” and ready for submission? What to do if the data are insufficient for a formal and robust hypothesis test? Ethical questions arise in various aspects of data analysis. Most important thing to recognize is that if the data analysis is about decisions, you will have to choose one. You will make errors. There is random noise in data that will cause errors is reading. You have to try your best to avoid making mistakes. And another thing to remember is that a statistical analysis is never done to perfection. What are the impacts of potential errors in the data analysis on the outcome of decisions? You will encounter a variety of error sources because you only have a limited time to spend generating the data and analyzing it. Type I error Type II error Overconfident projections Biased projections How to deal with the illusion of objectivity? Observer biases can come in a variety of ways. How the analytical question is asked determines the limits of how objectively a question can be answered. When you adopt your model structure, the formulation of the question needs to follow highly specific terms. How to communicate potential overconfidence? If you recognize possible sources of overconfidence, such as a given caveat x, in support of your hypothesis, how should they be made explicit in your writing and communication of your findings? How to formulate the null-hypothesis? Will the question result in a proper test of statistical significance? Smoking causing cancer is an example where it is difficult to provide a null hypothesis. This is a judgment call in the end, and in most cases it is a good idea to obtain guidance from your peers. When is the analysis “done” and ready for submission? Knowing when the analysis has been rigorous enough takes thorough consideration of the above questions. What to do if the data are insufficient for a formal and robust hypothesis test? These are all questions that need to be considered in analyzing data.

What can go wrong while testing an hypothesis? Type 1 error: Effect is noise but we assign significant connection. Null-hypothesis is rejected, when it is actually true “False positive”. Scientists typically design statistical tests with a low probability of a type 1 error (e.g., “p < 0.05”). Type 2 error: Effect is real, but we do not assign a significant connection. Null-hypothesis is accepted, when it is actually false. “False negative”. Optimal (or Bayesian) decision theory Design the strategy based on the relative costs of Type I and Type II errors. Example: A hurricane is predicted to arrive in Miami with p=0.2. Should you take action? Maximize the utility of the decision consistent with your posterior. A variety of straightforward errors can occur when testing the rigorousness of an hypothesis. We call the first group Type 1 errors: Effect is noise but we assign significant connection. Null-hypothesis is rejected, when it is actually true. “False positive”. Scientists typically design statistical tests with a low probability of a type 1 error (e.g., “p < 0.05”). In decision theory we want to evaluate outcomes and risk tradeoffs. The second are grouped as Type 2 errors: Effect is real, but we do not assign a significant connection. Null-hypothesis is accepted, when it is actually false. “False negative”. Optimal (or Bayesian) decision theory suggests dealing with these possible errors by Designing the strategy based on the relative costs of Type I and Type II errors. Example: A hurricane is predicted to arrive in Miami with p=0.2. Should you take action? Maximize the utility of the decision consistent with your posterior.

Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance? Next, we will look at the “rules of the game,” or what could be considered appropriate conduct for the field.

“Statisticians should: American Statistical Association Ethical Guidelines for Statistical Practice “Statisticians should:   present their findings and interpretations honestly and objectively; avoid untrue, deceptive, or undocumented statements; disclose any financial or other interests that may affect, or appear to affect, their professional statements.” Usually any main association or organization of a field will provide some ethical guidelines for conducting research and analysis in that given field. In this case, the American Statistical Association provides ethical guidelines for statistical practice. Statisticians should:   present their findings and interpretations honestly and objectively; avoid untrue, deceptive, or undocumented statements; disclose any financial or other interests that may affect, or appear to affect, their professional statements.” http://www.tcnj.edu/~asaethic/asagui.html

American Statistical Association Ethical Guidelines for Statistical Practice “Statisticians should:   delineate the boundaries of the inquiry as well as the boundaries of  the statistical inferences which can be derived from it; emphasize that statistical analysis may be an essential component of an inquiry and should be acknowledged in the same manner as other essential components; be prepared to document data sources used in an inquiry, known inaccuracies in the data, and steps taken to correct or refine the data, statistical procedures applied to the data, and the assumptions required for their application; make the data available for analysis by other responsible parties with appropriate safeguards for privacy concerns; recognize that the selection of a statistical procedure may to some  extent be a matter of judgment and that other statisticians may select  alternative procedures; direct any criticism of a statistical inquiry to the inquiry itself and not to the individuals conducting it”. “Statisticians should:   delineate the boundaries of the inquiry as well as the boundaries of  the statistical inferences which can be derived from it; emphasize that statistical analysis may be an essential component of an inquiry and should be acknowledged in the same manner as other essential components; be prepared to document data sources used in an inquiry, known inaccuracies in the data, and steps taken to correct or refine the data, statistical procedures applied to the data, and the assumptions required for their application; make the data available for analysis by other responsible parties with appropriate safeguards for privacy concerns; recognize that the selection of a statistical procedure may to some  extent be a matter of judgment and that other statisticians may select  alternative procedures; direct any criticism of a statistical inquiry to the inquiry itself and not to the individuals conducting it”. http://www.tcnj.edu/~asaethic/asagui.html

Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance?

What is overconfidence? Year of publication Error in the recommended values for the electron mass Henrion and Fischhoff (1986) What is overconfidence in data analysis? Estimates with artificially tight confidence bounds are overconfident. If the confidence bounds are tighter than what the data can actually account for. Overconfidence in subjective assessments and model predictions is common. Checking your opinions with others can be helpful in this case. Estimates with artificially tight confidence bounds are overconfident. Overconfidence in subjective assessments and model predictions is common.

What are key sources of overconfidence? Neglecting autocorrelation effects. Undersampling the unresolved variability (i.e., out-of-range projections). Assuming unimodal probability density functions. Neglecting model representation errors. Considering only a subset of the parametric uncertainty. Neglecting structural model uncertainty. Key sources of overconfidence include: Neglecting autocorrelation effects. Undersampling the unresolved variability (i.e., out-of-range projections such as using 40 years of data to project 400 years into the future.) Assuming unimodal probability density functions. Neglecting model representation errors. Considering only a subset of the parametric uncertainty. Neglecting structural model uncertainty. Let us look at an example that asks whether current climate projections overconfident? Are current climate projections overconfident?

The fact that the range of CO2 emission projections have widened over time is consistent with the hypothesis that previous projections have been overconfident. Morita et al (2001) “The 40 scenarios cover the full range of GHG [..] emissions consistent with the underlying range of driving forces from scenario literature” [Nakicenovic et al, 2000, p.46]. The fact that the range of CO2 emission projections have widened over time is consistent with the hypothesis that previous projections have been overconfident.

Miltich, Ricciuto, and Keller (2007) Past carbon cycle projections that neglect the uncertainty in historic land-use CO2 emissions are likely overconfident { Analyses adopting a single estimate of land use CO2 emissions Analyses accounting for uncertainty about which estimate of land use CO2 emissions is most likely, given observational constraints Further, past carbon cycle projections that neglect the uncertainty in historic land-use CO2 emissions are likely overconfident. Miltich, Ricciuto, and Keller (2007)

When might overconfidence result in biased decision-analyses? Designing risk management strategies in the face of threshold responses requires sound probabilistic information. Overconfident climate projection may underestimate the risks of low-probability high impact events. When might overconfidence result in biased decision-analyses? One relevant situation is the case of climate threshold responses. Designing risk management strategies in the face of threshold responses requires sound probabilistic information. Overconfident climate projection may underestimate the risks of low-probability high impact events.

Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance?

Where to go for guidance? ASA Ethical Guidelines for Statistical Practice, published by the American Statistical Association: http://www.tcnj.edu/~asaethic/asagui.html The Online Ethics Center for Engineering and Science: http://onlineethics.org/index.html Your mentors and peers. It takes wisdom and experience to do statistical analyses well. For further guidance, go to ASA Ethical Guidelines. The Online Ethics Center for Engineering and Science has some very helpful guidelines as well. Diversity of opinions and feedback helps. Go to your mentors and peers for feedback.

Discussion Questions / Checklist Should one submit a manuscript that may well be wrong and that could detrimentally affect the policy process? How do you define “detrimental”? When and how is it appropriate to exclude “data outliers”? Are the potential sources of biases clearly flagged Is the sensitivity to the choice of analyzed data sufficiently explained? Does the discussion adopt a specific value judgment about what is a “significant” result? Are there ethical issues in performing a “classic” “p<0.05” hypothesis test? Some questions for review. Use this as a checklist to help guide the construction and testing of a hypothesis. Should one submit a manuscript that may well be wrong and that could detrimentally affect the policy process? How do you define “detrimental”? When and how is it appropriate to exclude “data outliers”? Are the potential sources of biases clearly flagged? Is the sensitivity to the choice of analyzed data sufficiently explained? Does the discussion adopt a specific value judgment about what is a “significant” result? Are there ethical issues in performing a “classic” “p<0.05” hypothesis test?

Reading Materials L. I. Miltich, D.M. Ricciuto, and K. Keller: Which estimate of historic land use CO2 emissions makes most sense given atmospheric and oceanic CO2 observations?, preparation for Environmental Research Letters, http://www.geosc.psu.edu/~kkeller/wp/ (2007). Keller, K., Miltich, L.I., Robinson, A. and Tol, R.S.J.: 2007, 'How overconfident are current projections of carbon dioxide emissions?' Working Paper Series, Research Unit Sustainability and Global Change, Hamburg University. FNU-124, http://ideas.repec.org/s/sgc/wpaper.html. Berger, J. O., and D. A. Berry. 1988. Statistical-Analysis and the Illusion of Objectivity. American Scientist 76 (2):159-165. Cohen, J. 1994. The Earth Is Round (P-Less-Than.05). American Psychologist 49 (12):997-1003. Lipton, P. 2005. Testing hypotheses: Prediction and prejudice. Science 307 (5707):219-221.