DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Fairmode: Some considerations on the Delta tool and model performance.

Slides:



Advertisements
Similar presentations
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Advertisements

Member consultation 2007 Draft ISPM: Sampling of Consignments Steward: David Porritt.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Example 2.2 Estimating the Relationship between Price and Demand.
Unit 27 Spreadsheet Modelling
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Future trends in NO 2 concentrations – implications for LAQM and planning applications Stephen Moorcroft Air Quality Consultants Ltd.
Using the Crosscutting Concepts As conceptual tools when meeting an unfamiliar problem or phenomenon.
The Scientific Method.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)
Functionalities of indicators and role of context Robert Joumard & Henrik Gudmundsson 1. Definition of an ‘indicator’ 2. Characteristics of indicators.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Consistency/Reliability
G544:DEBATES IS PSYCHOLOGY A SCIENCE?
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Investment Analysis and Portfolio Management
Remote Monitoring and Desktop Management Week-7. SNMP designed for management of a limited range of devices and a limited range of functions Monitoring.
15” 1o” ProceduresUnderstandings Evaluate provenance 1.Author background, expertise, experience affect competence to “speak” about issue (depends.
Codex Guidelines for the Application of HACCP
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
A year 1 computer userA year 2 computer userA year 3 computer user Algorithms and programming I can create a series of instructions. I can plan a journey.
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Annex I: Methods & Tools prepared by some members of the ICH Q9 EWG for example only; not an official policy/guidance July 2006, slide 1 ICH Q9 QUALITY.
Determining Sample Size
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Commercial Database Applications Testing. Test Plan Testing Strategy Testing Planning Testing Design (covered in other modules) Unit Testing (covered.
Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.
‘Projections in Hindsight’ – An assessment of past emission projections reported by Member States under EU air pollution and greenhouse gas legislation.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Technical Adequacy Session One Part Three.
Evaluation methods and tools (Focus on delivery mechanism) Jela Tvrdonova, 2014.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
A GENERIC PROCESS FOR REQUIREMENTS ENGINEERING Chapter 2 1 These slides are prepared by Enas Naffar to be used in Software requirements course - Philadelphia.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Introduction to the 10th Harmonisation conference Helge Rørdam Olesen National Environmental Research Institute (NERI) Denmark Chairman of the initiative.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Louis Gross, Ecology and Evolutionary Biology and Mathematics, University of Tennessee Thoughts on Raccoon Rabies Models.
Finnish-Russian Doctoral Seminar on Multicriteria Decision Aid and Optimization Iryna Yevseyeva Niilo Mäki Institute University of Jyväskylä, Finland
RE - SEARCH ---- CAREFUL SEARCH OR ENQUIRY INTO SUBJECT TO DISCOVER FACTS OR INVESTIGATE.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Inferential Statistics A Closer Look. Analyze Phase2 Nature of Inference in·fer·ence (n.) “The act or process of deriving logical conclusions from premises.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
MSE-415: B. Hawrylo Chapter 13 – Robust Design What is robust design/process/product?: A robust product (process) is one that performs as intended even.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
VVSG: Usability, Accessibility, Privacy 1 VVSG, Part 1, Chapter 3 Usability, Accessibility, and Privacy December 6, 2007 Dr. Sharon Laskowski
Design Speed and Target Speed Norman W. Garrick Lecture 3.1 Street and Highway Design Norman W. Garrick Lecture 3.1 Street and Highway Design.
Assurance service/engagement
International Atomic Energy Agency Regulatory Review of Safety Cases for Radioactive Waste Disposal Facilities David G Bennett 7 April 2014.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
Paul Alexander 2 nd SKADS Workshop October 2007 SKA and SKADS Costing The Future Paul Alexander Andrew Faulkner, Rosie Bolton.
Quality Assurance in the Presence of Variability Kim Lauenroth, Andreas Metzger, Klaus Pohl Institute for Computer Science and Business Information Systems.
Stages of Test Development By Lily Novita
Principal Component Analysis
LSM733-PRODUCTION OPERATIONS MANAGEMENT By: OSMAN BIN SAIF LECTURE 30 1.
Big Data Quality Panel Norman Paton University of Manchester.
Assessment and the Institutional Environment Context Institutiona l Mission vision and values Intended learning and Educational Experiences Impact Educational.
For more course tutorials visit QNT 565 Entire Course For more course tutorials visit QNT 565 Week 1 Individual Assignment.
Audit and Assurance Introduction. Requirement  Preview before class. Ask more, and discuss more. Ask more, and discuss more. Make notes. Make notes.
RTI, MUMBAI / CH 61 REPORTING PROCESS DAY 6 SESSION NO.1 (THEORY ) BASED ON CHAPTER 6 PERFORMANCE AUDITING GUIDELINES.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Q1 issues: Lack of staff agreement in the decisions as to if and how to utilize volunteers within the agency Absence of job descriptions or support and.
EER Assurance Presentation of Issues and Project Update June 2018
Alignment of Part 4B with ISAE 3000
EER Assurance December 2018
EVIDENCE COLLECTION TECHNIQUES (Interview)
Presentation transcript:

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Fairmode: Some considerations on the Delta tool and model performance evaluation Helge Rørdam Olesen National Environmental Research Institute (NERI) Aarhus Universitet

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Two approaches to performance evaluation › Rigorous statistical evaluation (or Operational evaluation). Consists of computing statistical performance measures according to a specific protocol. The Delta tool is intended to be useful in this respect. › Exploratory (or Diagnostic) approach, where modelled and observed data are plotted in various ways. Insightful groping for clues to improve model performance. The Delta tool does have an exploration mode, but is not as flexible as certain other tools.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Comments on the approaches › The exploratory approach is indispensable. It is necessary for assuring that the model not only gives the right results, but for the right reasons. Exploratory data analyses can detect potential errors in data and model setup, and can also highlight notable features in data and reveal shortcomings of models. › Ideally, statistical performance measures – such as those in the Delta tool - should place an administrator in a position to distinguish an adequate model from a less so. However, a lot of caution is required. Pure metrics may easily be misleading.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Target plots and performance metrics › You may be deceived, both by nice-looking plots or by awfully looking plots. › The context is important! › What do the underlying data represent? The challenge to the model systems may be really severe, or it may be trivial. › Defining a ’band of acceptance’ for models requires much care in order to ensure that such criteria are not misleading. Performance statistics are totally dependent on the challenge to which you expose a model!

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen ”Model benchmarking” › Note that the benchmarking results not just address the performance of a dispersion model. › The activity addresses not only the model but the performance of an entire system, consisting of input data (e.g. involving traffic counts, car fleet characteristics etc), a dispersion model, the user and the choices she makes on various options - and all of this is tested against measurements representing a certain scenario, which may or may not correspond to the the assumptions made. › One example of difficulties: Road construction work somewhere may affect traffic characteristics at the point of interest so the assumed input data become obsolete.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen 90 percent criteria in current Directive › Perhaps too tough a challenge, › Example: Construction work close to a monitoring station imply severely raised levels of NOx due to heavy machinery. The model does not account for this.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Which type of data should quality objectives refer to? › Option A: Research-grade data obtained somewhere in Europe, which are used in a common exercise? › Option B: Your own national data, obtained through national monitoring and modelling?

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen An important threshold for Delta users: How much effort is required to get started with the Delta tool? › Understand the context (the Directive) › To some extent read background material (JRC papers) › Download and install IDL and the tool › Get acquainted with the tool. Understand the format of data, explore potential of the tool, understand the way it works. › Prepare your own data. › Getting closer to understand the meaning of the metrics. What is good performance for the case at hand?

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Delta tool from a user’s point of view

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen The logic in selection is not obvious. Give a hint like: In the left pane please select one or more models+scenarios. In the right pane select one or more parameters and stations. Various filters (Type, Parameter, Zone) are available

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen Hard to understand explanation on Multiple choice info.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen › The issue of understanding is central to an evaluation of air quality models. One wants to know if the model is adequate, or conservative or accurate enough for one's purposes. Data sets are limited and cannot possibly cover all possible conditions under which the models are expected to be used. Therefore, one is forced to extrapolate model behaviour well outside the range of veracity of the particular evaluation results.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen › To properly make such extrapolations requires development of an understanding of the different causes contributing to bias, or even lack of bias, in a model's predictions and relating those causes to the model's parametrization of physical processes. One needs to know if the model is producing the right or wrong answer for the right reason.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen › It appears that the goal of obtaining both reasonably objective and well-defined evaluations and adequate understanding is not attainable through the use of simple, rote approaches to the calculation of evaluation statistics. This conclusion comes from experience in carrying out performance evaluations.

DANMARKS MILJØUNDERSØGELSER AARHUS UNIVERSITET September 21, 2010 Helge Rørdam Olesen ›... Statistics alone cannot produce understanding nor discern the various causes of model behaviour. They are an aid to thinking, but no replacement for it. Not only that, statistical measures can provide misleading guidance if understanding is lacking. Robin Dennis, 1986