Traditional Verification Scores Fake forecasts  5 geometric  7 perturbed subjective evaluation  expert scores from last year’s workshop  9 cases x.

Slides:



Advertisements
Similar presentations
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Advertisements

Validation of Satellite Precipitation Estimates for Weather and Hydrological Applications Beth Ebert BMRC, Melbourne, Australia 3 rd IPWG Workshop / 3.
Nonparametric Statistics Timothy C. Bates
Statistical Decision Making
Venugopal, Basu, and Foufoula-Georgiou, 2005: New metric for comparing precipitation patterns… Verification methods reading group April 4, 2008 D. Ahijevych.
Winslow Homer: “On The Stile” INFERENTIAL PROBLEM SOLVING Hypothesis Testing and t-tests Chapter 6:
STAT 135 LAB 14 TA: Dongmei Li. Hypothesis Testing Are the results of experimental data due to just random chance? Significance tests try to discover.
Method for Object-based Diagnostic Evaluation (MODE) Fake forecasts.
Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Inferential Stats for Two-Group Designs. Inferential Statistics Used to infer conclusions about the population based on data collected from sample Do.
Lecture 9: One Way ANOVA Between Subjects
Wilcoxon Tests What is the Purpose of Wilcoxon Tests? What are the Assumptions? How does the Wilcoxon Rank-Sum Test Work? How does the Wilcoxon Matched-
DEPENDENT SAMPLES t-TEST What is the Purpose?What Are the Assumptions?How Does it Work?
PSYC512: Research Methods PSYC512: Research Methods Lecture 9 Brian P. Dyre University of Idaho.
Significance Tests P-values and Q-values. Outline Statistical significance in multiple testing Statistical significance in multiple testing Empirical.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
7/2/2015Basics of Significance Testing1 Chapter 15 Tests of Significance: The Basics.
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Today Concepts underlying inferential statistics
The t Tests Independent Samples.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
General Linear Model & Classical Inference
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Comparing Two Means Prof. Andy Field.
User Study Evaluation Human-Computer Interaction.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Association between 2 variables
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Measuring forecast skill: is it real skill or is it the varying climatology? Tom Hamill NOAA Earth System Research Lab, Boulder, Colorado
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES Lecture 18:
A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Experimental Design and Statistics. Scientific Method
Power Analysis for Traditional and Modern Hypothesis Tests
Nonparamentric Stats –Distribution free tests –e.g., rank tests Sign test –H 0 : Median = 100 H a : Median > 100 if median = 100, then half above, half.
Bias Adjusted Precipitation Scores Fedor Mesinger NOAA/Environmental Modeling Center and Earth System Science Interdisciplinary Center (ESSIC), Univ. Maryland,
Testing Your Hypothesis In your previous assignments you were supposed to develop two hypotheses that examine a relationship between two variables. For.
What’s in the Bag? Lecture 4 Section Wed, Jan 25, 2006.
Digression - Hypotheses Many research designs involve statistical tests – involve accepting or rejecting a hypothesis Null (statistical) hypotheses assume.
T Test for Two Related Samples (Repeated Measures)
Inferential Statistics Significance Testing Chapter 4.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Testing Hypotheses II Lesson 10. A Directional Hypothesis (1-tailed) n Does reading to young children increase IQ scores?  = 100,  = 15, n = 25 l sample.
Operational verification system Rodica Dumitrache National Metorogical Administration ROMANIA.
Nonparametric Tests with Ordinal Data Chapter 18.
Gridded warning verification Harold E. Brooks NOAA/National Severe Storms Laboratory Norman, Oklahoma
NCAR, 15 April Fuzzy verification of fake cases Beth Ebert Center for Australian Weather and Climate Research Bureau of Meteorology.
Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Weather type dependant fuzzy verification of precipitation.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Testing and Statistical Significance
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
More about tests and intervals CHAPTER 21. Do not state your claim as the null hypothesis, instead make what you’re trying to prove the alternative. The.
Centro Nazionale di Meteorologia e Climatologia Aeronautica Common Verification Suite Zurich, Sep 2005 Alessandro GALLIANI, Patrizio EMILIANI, Adriano.
Introdcution to Epidemiology for Medical Students Université Paris-Descartes Babak Khoshnood INSERM U1153, Equipe EPOPé (Dir. Pierre-Yves Ancel) Obstetric,
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Slide 9-1 Copyright © 2012, 2008, 2005 Pearson Education, Inc. Chapter 9 Hypothesis Tests for One Population Mean.
Spatial Verification Intercomparison Meeting, 20 February 2007, NCAR
One Way ANOVAs One Way ANOVAs
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

Traditional Verification Scores Fake forecasts  5 geometric  7 perturbed subjective evaluation  expert scores from last year’s workshop  9 cases x 3 models

Geometric error/scores for first 4 cases  correlation coefficient =  prob of detection = 0.00  false alarm ratio = 1.00  Hanssen&Kuipers =  equitable threat = case 5  correlation coefficient = 0.2  prob of detection = 0.88  false alarm ratio = 0.89  Hanssen&Kuipers = 0.69  equitable threat = 0.08 THE WINNER

Perturbed fake cases – known errors 1.3 pts right, 5 pts down 2.6 pts right, 10 pts down 3.12 pts right, 20 pts down 4.24 pts right, 40 pts down 5.48 pts right, 80 pts down 6.12 pts right, 20 pts down, times pts right, 20 pts down, minus 0.05”

Perturbed fake cases 1-3 obs 1 23

multiplicative bias thresholds >0, >=0.01”, >=0.02”, >=0.03”

Gilbert skill score (ETS)

subjective evaluation A C B

histograms of expert scores 24 first-trial scores 22 second-trial scores

observation (truth) wrf4ncar wrf2caps wrf4ncep

observation (truth) wrf4ncar wrf2caps wrf4ncep

expert scores vs grid stats Equitable threat score (Gilbert Skill score) forecast area bias (thresh=0.07”) 95% conf

expert scores vs grid stats odds ratioPearson product-moment correlation coefficient regular bootstrap method

do the expert scores show significant differences among the models? Student's t-Test 2-tail, paired 2-trial mean p-value wrf2caps-wrf4ncar 0.04 wrf2caps-wrf4ncep 0.06 wrf4ncar-wrf4ncep Chance null hypothesis is true (i.e. no difference in means)

do the expert scores show significant differences among the models? Wilcoxon-Mann-Whitney rank-sum test (Wilks, p. 138)2-tail probability difference in ranks due to chance wrf2caps-wrf4ncar0.299 wrf2caps-wrf4ncep0.148 wrf4ncar-wrf4ncep0.018 Wilcoxon signed-rank test (Wilks, p. 142)2-tail wrf2caps-wrf4ncar0.737 wrf2caps-wrf4ncep0.177 wrf4ncar-wrf4ncep0.152