How to communicate science clearly

Slides:



Advertisements
Similar presentations
Best practices to evaluate land change models Robert Gilmore Pontius Jr Clark University, USA 1.
Advertisements

Objectives (BPS chapter 24)
1 MULTIPLE-SCALE PATTERN RECOGNITION: Application to Drought Prediction in Africa R Gil Pontius Jr Hao Chen, and.
Regression line – Fitting a line to data If the scatter plot shows a clear linear pattern: a straight line through the points can describe the overall.
Welcome to class today! Chapter 12 summary sheet Jimmy Fallon video
Ranga Rodrigo April 5, 2014 Most of the sides are from the Matlab tutorial. 1.
Inference for regression - Simple linear regression
5-2 Probability Distributions This section introduces the important concept of a probability distribution, which gives the probability for each value of.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Discrete Distributions The values generated for a random variable must be from a finite distinct set of individual values. For example, based on past observations,
AP Stat Review Descriptive Statistics Grab Bag Probability
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Statistics (cont.) Psych 231: Research Methods in Psychology.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Simulations and Normal Distribution Week 4. Simulations Probability Exploration Tool.
23. Inference for regression
Starter: y = x + 1 B) y = x – 1.
MATH-138 Elementary Statistics
Sections Review.
Comparison among Six Datasets of Forest at 2010 in The Philippines
Statistical Process Control
Inference and Tests of Hypotheses
CHAPTER 11 Inference for Distributions of Categorical Data
Warm Up Burning Cash Cloud in a Bottle
Correlation and Regression
Chapter 5 Hypothesis Testing
Introduction to Inferential Statistics
The Practice of Statistics in the Life Sciences Fourth Edition
Calculating Sample Size: Cohen’s Tables and G. Power
Everyone thinks they know this stuff
Making Science Graphs and Interpreting Data
Chapter 9 Hypothesis Testing.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Decision Errors and Power
Section 3.3 Linear Regression
AP Statistics, Section 3.3, Part 1
CHAPTER 11 Inference for Distributions of Categorical Data
What you will learn in this chapter:
3 4 Chapter Describing the Relation between Two Variables
Descriptive and Inferential
When You See (This), You Think (That)
CHAPTER 11 Inference for Distributions of Categorical Data
Warm-Up.
Basic Practice of Statistics - 3rd Edition Inference for Regression
Psych 231: Research Methods in Psychology
Chapter 13: Inference for Distributions of Categorical Data
Goodness of Fit.
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
CHAPTER 3 Describing Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
Psych 231: Research Methods in Psychology
CHAPTER 11 Inference for Distributions of Categorical Data
Robert Gilmore Pontius Jr Geography Professor at Clark University
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Cost behaviour, cost drivers and cost estimation
Chapters Important Concepts and Terms
DESIGN OF EXPERIMENTS by R. C. Baker
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Graphs in Science p. 34.
Presentation transcript:

How to communicate science clearly Gil Pontius 21 March 2018

Main Points You should put the subject first, then the verb when you write a sentence. This will fix 85% of your writing problems. Use the active voice. If you are a real scientist, then you make your research question so that results are important regardless of how the results turn out. If you are hoping your results turn out in particular ways while you are doing the analysis, then you are not doing science. Follow the recommendations in Pontius’ documents available as items 80 and 81 at www.clarku.edu/~rpontius

Detailed Recommendations for your notes You should put the subject first, then the verb when you write a sentence. This will fix 85% of your writing problems. Use the active voice. Do not use pronouns. Use “significant” in science if and only if you mean the p-value is less than the alpha-level in inferential statistics. Use “random” when you used a random number generator to perform the selection. Do not use “random” to mean “haphazard” or to mean that you do not understand the process. Do not use the words “good”, “bad” or similar value laden when you describe results. If you are a real scientist, then you make your research question so that results are important regardless of how the results turn out. If you are hoping your results turn out in particular ways while you are doing the analysis, then you are not doing science. If you want to show compare predicted versus observed data, then use a scatter plot and show the 1 to 1 line where the axes have identical ranges. Report Mean Deviation and Mean Absolute Deviation to explain how the points lie with respect to the 1 to 1 line. Use the word “fitted line” when using regression to fit a line to the data. Do no use the word “predicted”. The line comes after the data, not before the data. Consult with a statistician BEFORE you collect the data. Follow the recommendations in Pontius’ documents available as items 80 and 81 at www.clarku.edu/~rpontius

Pixel Counts and Patterns Run A Category Pixels Correct Rejection 694771 Miss 34427 False Alarm Hit 1646 Developed in 2001 494870 Run B Category Pixels Correct Rejection 696677 Miss 32521 False Alarm Hit 3552 Developed in 2001 494870 A clear pattern is present in this data, where the total amount of pixels on either run associated with being either a ‘hit’ or a ‘miss’ have equal pixel counts. This is because they represent Reference Change, which was originally based off of the ‘cheat’ we used when first running the GeoMod. Because the software had a set amount of pixels it knew it had to change, it acted within those bounds.

You should write so that it is clear who did what. Passive voice fails to reveal who did what.

Land Change of PIE from 2001 to 2011 The change during 2006 and 2011 is not as significant as 2001, and all of the changes happened near developed area.

The First Run How to read a TOC plot: The first run TOC Curve The curve (red line) shows the validity of the suitability map. If the curve aligned the Maximum line, then every higher value in the suitability map hits the referenced development. If the curve aligned the Minimum line, then every higher value in the suitability map false alarms the referenced development. The Uniform line is the hypothetical random simulation process, which means the suitability map plays no role in the simulation. The nine numbers on the curve are the threshold numbers, which corresponded to the suitability values when 10%, 20%, 30%, and so on to 90% of the total pixel are assigned as developed or undeveloped. The first run Overall, the suitability map performed well, since the curve is on the left side (maximum side) to the uniform line. After the threshold value at 29.6, and the corresponding Hits at about 29 thousand square kilometers, the Correctness of Hit began to decrease quickly and the False Alarms began to increase quickly. TOC Curve

The Second Run The second run TOC Curve Overall, the suitability map performed well and better than the first run, since the curve is more close to the Maximum line. After the threshold value at 33.1, and the corresponding Hits at about 30 thousand square kilometers, the Correctness of Hit began to decrease quickly and the False Alarms began to increase quickly. This means the higher value of the suitability map are more agree with the referenced developed area. This is because the Protected map, an important factor, was took into consideration of the simulation. TOC Curve

Absolute Deviations are 4

Absolute Deviations are 1 or 7

Absolute Deviations are 1 or 7

Absolute Deviations are 1 or 7

Absolute Deviations are 4

Four measurements for nine plots

Datasarus https://www.autodeskresearch.com/publications/samestats

To compare two variables that show the same set of categories … Use PontiusMartrix41 available at www.clarku.edu/~rpontius Do not use Kappa. Pontius Jr, Robert Gilmore and Marco Millones. 2011. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing 32(15): 4407-4429. Pontius Jr, Robert Gilmore and Ali Santacruz. 2014. Quantity, Exchange and Shift Components of Differences in a Square Contingency Table. International Journal of Remote Sensing 35(21): 7543-7554. See videos at http://www2.clarku.edu/~rpontius/videos.html