Download presentation
Presentation is loading. Please wait.
Published byLucy Townsend Modified over 6 years ago
1
Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 3: Ethical Aspects of Data Analysis AUTHOR: KLAUS KELLER and LOUISE MILITCH Department of Geosciences The Pennsylvania State University With input from Nancy Tuana, Ken Davis, Jim Shortle, Michelle Stickler, Don Brown, and Erich Schienke
2
Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance? The following guiding questions will provide you with a foundation for recognizing how ethical issues can arise in the interpretation of data. What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance? We will not cover all questions, rather, this is to help you to recognize the most outstanding issues.
3
What are potential ethical questions arising in data analysis?
What are the impacts of potential errors in the data analysis result on the outcome of decision? Type I error Type II error Overconfident projections Biased projections How to deal with the illusion of objectivity? How to communicate potential overconfidence? How to formulate the null-hypothesis? When is the analysis “done” and ready for submission? What to do if the data are insufficient for a formal and robust hypothesis test? Ethical questions arise in various aspects of data analysis. Most important thing to recognize is that if the data analysis is about decisions, you will have to choose one. You will make errors. There is random noise in data that will cause errors is reading. You have to try your best to avoid making mistakes. And another thing to remember is that a statistical analysis is never done to perfection. What are the impacts of potential errors in the data analysis on the outcome of decisions? You will encounter a variety of error sources because you only have a limited time to spend generating the data and analyzing it. Type I error Type II error Overconfident projections Biased projections How to deal with the illusion of objectivity? Observer biases can come in a variety of ways. How the analytical question is asked determines the limits of how objectively a question can be answered. When you adopt your model structure, the formulation of the question needs to follow highly specific terms. How to communicate potential overconfidence? If you recognize possible sources of overconfidence, such as a given caveat x, in support of your hypothesis, how should they be made explicit in your writing and communication of your findings? How to formulate the null-hypothesis? Will the question result in a proper test of statistical significance? Smoking causing cancer is an example where it is difficult to provide a null hypothesis. This is a judgment call in the end, and in most cases it is a good idea to obtain guidance from your peers. When is the analysis “done” and ready for submission? Knowing when the analysis has been rigorous enough takes thorough consideration of the above questions. What to do if the data are insufficient for a formal and robust hypothesis test? These are all questions that need to be considered in analyzing data.
4
What can go wrong while testing an hypothesis?
Type 1 error: Effect is noise but we assign significant connection. Null-hypothesis is rejected, when it is actually true “False positive”. Scientists typically design statistical tests with a low probability of a type 1 error (e.g., “p < 0.05”). Type 2 error: Effect is real, but we do not assign a significant connection. Null-hypothesis is accepted, when it is actually false. “False negative”. Optimal (or Bayesian) decision theory Design the strategy based on the relative costs of Type I and Type II errors. Example: A hurricane is predicted to arrive in Miami with p=0.2. Should you take action? Maximize the utility of the decision consistent with your posterior. A variety of straightforward errors can occur when testing the rigorousness of an hypothesis. We call the first group Type 1 errors: Effect is noise but we assign significant connection. Null-hypothesis is rejected, when it is actually true. “False positive”. Scientists typically design statistical tests with a low probability of a type 1 error (e.g., “p < 0.05”). In decision theory we want to evaluate outcomes and risk tradeoffs. The second are grouped as Type 2 errors: Effect is real, but we do not assign a significant connection. Null-hypothesis is accepted, when it is actually false. “False negative”. Optimal (or Bayesian) decision theory suggests dealing with these possible errors by Designing the strategy based on the relative costs of Type I and Type II errors. Example: A hurricane is predicted to arrive in Miami with p=0.2. Should you take action? Maximize the utility of the decision consistent with your posterior.
5
Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance? Next, we will look at the “rules of the game,” or what could be considered appropriate conduct for the field.
6
“Statisticians should:
American Statistical Association Ethical Guidelines for Statistical Practice “Statisticians should: present their findings and interpretations honestly and objectively; avoid untrue, deceptive, or undocumented statements; disclose any financial or other interests that may affect, or appear to affect, their professional statements.” Usually any main association or organization of a field will provide some ethical guidelines for conducting research and analysis in that given field. In this case, the American Statistical Association provides ethical guidelines for statistical practice. Statisticians should: present their findings and interpretations honestly and objectively; avoid untrue, deceptive, or undocumented statements; disclose any financial or other interests that may affect, or appear to affect, their professional statements.”
7
American Statistical Association Ethical Guidelines for Statistical Practice
“Statisticians should: delineate the boundaries of the inquiry as well as the boundaries of the statistical inferences which can be derived from it; emphasize that statistical analysis may be an essential component of an inquiry and should be acknowledged in the same manner as other essential components; be prepared to document data sources used in an inquiry, known inaccuracies in the data, and steps taken to correct or refine the data, statistical procedures applied to the data, and the assumptions required for their application; make the data available for analysis by other responsible parties with appropriate safeguards for privacy concerns; recognize that the selection of a statistical procedure may to some extent be a matter of judgment and that other statisticians may select alternative procedures; direct any criticism of a statistical inquiry to the inquiry itself and not to the individuals conducting it”. “Statisticians should: delineate the boundaries of the inquiry as well as the boundaries of the statistical inferences which can be derived from it; emphasize that statistical analysis may be an essential component of an inquiry and should be acknowledged in the same manner as other essential components; be prepared to document data sources used in an inquiry, known inaccuracies in the data, and steps taken to correct or refine the data, statistical procedures applied to the data, and the assumptions required for their application; make the data available for analysis by other responsible parties with appropriate safeguards for privacy concerns; recognize that the selection of a statistical procedure may to some extent be a matter of judgment and that other statisticians may select alternative procedures; direct any criticism of a statistical inquiry to the inquiry itself and not to the individuals conducting it”.
8
Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance?
9
What is overconfidence?
Year of publication Error in the recommended values for the electron mass Henrion and Fischhoff (1986) What is overconfidence in data analysis? Estimates with artificially tight confidence bounds are overconfident. If the confidence bounds are tighter than what the data can actually account for. Overconfidence in subjective assessments and model predictions is common. Checking your opinions with others can be helpful in this case. Estimates with artificially tight confidence bounds are overconfident. Overconfidence in subjective assessments and model predictions is common.
10
What are key sources of overconfidence?
Neglecting autocorrelation effects. Undersampling the unresolved variability (i.e., out-of-range projections). Assuming unimodal probability density functions. Neglecting model representation errors. Considering only a subset of the parametric uncertainty. Neglecting structural model uncertainty. Key sources of overconfidence include: Neglecting autocorrelation effects. Undersampling the unresolved variability (i.e., out-of-range projections such as using 40 years of data to project 400 years into the future.) Assuming unimodal probability density functions. Neglecting model representation errors. Considering only a subset of the parametric uncertainty. Neglecting structural model uncertainty. Let us look at an example that asks whether current climate projections overconfident? Are current climate projections overconfident?
11
The fact that the range of CO2 emission projections have widened over time is consistent with the hypothesis that previous projections have been overconfident. Morita et al (2001) “The 40 scenarios cover the full range of GHG [..] emissions consistent with the underlying range of driving forces from scenario literature” [Nakicenovic et al, 2000, p.46]. The fact that the range of CO2 emission projections have widened over time is consistent with the hypothesis that previous projections have been overconfident.
12
Miltich, Ricciuto, and Keller (2007)
Past carbon cycle projections that neglect the uncertainty in historic land-use CO2 emissions are likely overconfident { Analyses adopting a single estimate of land use CO2 emissions Analyses accounting for uncertainty about which estimate of land use CO2 emissions is most likely, given observational constraints Further, past carbon cycle projections that neglect the uncertainty in historic land-use CO2 emissions are likely overconfident. Miltich, Ricciuto, and Keller (2007)
13
When might overconfidence result in biased decision-analyses?
Designing risk management strategies in the face of threshold responses requires sound probabilistic information. Overconfident climate projection may underestimate the risks of low-probability high impact events. When might overconfidence result in biased decision-analyses? One relevant situation is the case of climate threshold responses. Designing risk management strategies in the face of threshold responses requires sound probabilistic information. Overconfident climate projection may underestimate the risks of low-probability high impact events.
14
Guiding Questions What are potential ethical questions arising in data analysis? What are the “rules of the game”? Do research publications follow these rules? Where to go for guidance?
15
Where to go for guidance?
ASA Ethical Guidelines for Statistical Practice, published by the American Statistical Association: The Online Ethics Center for Engineering and Science: Your mentors and peers. It takes wisdom and experience to do statistical analyses well. For further guidance, go to ASA Ethical Guidelines. The Online Ethics Center for Engineering and Science has some very helpful guidelines as well. Diversity of opinions and feedback helps. Go to your mentors and peers for feedback.
16
Discussion Questions / Checklist
Should one submit a manuscript that may well be wrong and that could detrimentally affect the policy process? How do you define “detrimental”? When and how is it appropriate to exclude “data outliers”? Are the potential sources of biases clearly flagged Is the sensitivity to the choice of analyzed data sufficiently explained? Does the discussion adopt a specific value judgment about what is a “significant” result? Are there ethical issues in performing a “classic” “p<0.05” hypothesis test? Some questions for review. Use this as a checklist to help guide the construction and testing of a hypothesis. Should one submit a manuscript that may well be wrong and that could detrimentally affect the policy process? How do you define “detrimental”? When and how is it appropriate to exclude “data outliers”? Are the potential sources of biases clearly flagged? Is the sensitivity to the choice of analyzed data sufficiently explained? Does the discussion adopt a specific value judgment about what is a “significant” result? Are there ethical issues in performing a “classic” “p<0.05” hypothesis test?
17
Reading Materials L. I. Miltich, D.M. Ricciuto, and K. Keller: Which estimate of historic land use CO2 emissions makes most sense given atmospheric and oceanic CO2 observations?, preparation for Environmental Research Letters, (2007). Keller, K., Miltich, L.I., Robinson, A. and Tol, R.S.J.: 2007, 'How overconfident are current projections of carbon dioxide emissions?' Working Paper Series, Research Unit Sustainability and Global Change, Hamburg University. FNU-124, Berger, J. O., and D. A. Berry Statistical-Analysis and the Illusion of Objectivity. American Scientist 76 (2): Cohen, J The Earth Is Round (P-Less-Than.05). American Psychologist 49 (12): Lipton, P Testing hypotheses: Prediction and prejudice. Science 307 (5707):
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.