Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team.

Slides:



Advertisements
Similar presentations
Correlation and regression
Advertisements

1 Alberta Agriculture and Food (AF) Surface Meteorological Stations and Data Quality Control Procedures.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Snow Trends in Northern Spain. Analysis and Simulation with Statistical Downscaling Methods Thanks to: Daniel San Martín, Sixto.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Correlation. Introduction Two meanings of correlation –Research design –Statistical Relationship –Scatterplots.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Designing Experiments In designing experiments we: Manipulate the independent.
Mining Sequence Patterns from Wind Tunnel Experimental Data Zhenyu Liu †, Wesley W. Chu †, Adam Huang ‡, Chris Folk ‡, Chih-Ming Ho ‡
All Sensors Note: Be sure you have already selected your station and time interval before choosing this product.
Social Research Methods
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Measures of Association Deepak Khazanchi Chapter 18.
Standard Scores & Correlation. Review A frequency curve either normal or otherwise is simply a line graph of all frequency of scores earned in a data.
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
1 PARAMETRIC VERSUS NONPARAMETRIC STATISTICS Heibatollah Baghi, and Mastee Badii.
Nonparametric or Distribution-free Tests
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
SHOWTIME! STATISTICAL TOOLS IN EVALUATION CORRELATION TECHNIQUE SIMPLE PREDICTION TESTS OF DIFFERENCE.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
©aSup   Menghitung Korelasi Bivariat menggunakan SPSS Pearson's correlation coefficient, Spearman's rho, and Kendall's tau-b.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Quantitative Skills: Data Analysis
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Renewable Energy Research Laboratory University of Massachusetts Prediction Uncertainties in Measure- Correlate-Predict Analyses Anthony L. Rogers, Ph.D.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Research Methodology Lecture No :24. Recap Lecture In the last lecture we discussed about: Frequencies Bar charts and pie charts Histogram Stem and leaf.
Ping Zhu, AHC5 234, Office Hours: M/W/F 10AM - 12 PM, or by appointment M/W/F,
Chapter 8 Making Sense of Data in Six Sigma and Lean
Chapter 13 Descriptive Data Analysis. Statistics  Science is empirical in that knowledge is acquired by observation  Data collection requires that we.
TYPES OF DATA KEEP THE ACTIVITIES ROLLING Data, Standard Deviation, Statistical Significance.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Weather vs. Climate Notes/Vocabulary pgs. D34 & D84 Chapter 10 Lesson 3 & Chapter 11 Lesson 9.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 11 Lecture Research Techniques: For the Health Sciences Fifth Edition Analyzing and Interpreting Data: Descriptive Analysis R. Eric Heidel, PhD.
Chapter 9 Correlational Research Designs. Correlation Acceptable terminology for the pattern of data in a correlation: *Correlation between variables.
EVALUATION OF THE RADAR PRECIPITATION MEASUREMENT ACCURACY USING RAIN GAUGE DATA Aurel Apostu Mariana Bogdan Coralia Dreve Silvia Radulescu.
Potential Evaporation and Irrigation in Manoa Valley Geography 405 Liat Portner, George Bugarin, Alexa Grinpas,Henry Pascher, Nicole Miller, Maeghan Castillo,
Analyzing and Interpreting Quantitative Data
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
Module 1: Measurements & Error Analysis Measurement usually takes one of the following forms especially in industries: Physical dimension of an object.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
Evidence in ISCCP for regional patterns of cloud response to climate change Joel Norris Scripps Institution of Oceanography ISCCP at 30 Workshop City College.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Instrument Characteristics  Scientific Instrument: l A device for making a measurement.  Measurement: l An action intended to assign a number as the.
Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.
Factors Affecting Climate. WHAT IS CLIMATE? Climate is the average year-by-year conditions of temperature, precipitation, winds, and clouds of an entire.
How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins.
Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.
Chapter 12 Understanding Research Results: Description and Correlation
CORRELATION.
G10 Anuj Karpatne Vijay Borra
Statistics.
Parametric vs Non-Parametric
Social Research Methods
Statistics PSY302 Review Quiz One Fall 2018
Data Mining: Exploring Data
The greatest blessing in life is
Unit XI: Data Analysis in nursing research
M248: Analyzing data Block D UNIT D3 Related variables.
Data exploration and visualization
Presentation transcript:

Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team

TOC QA/QC Requirements Detecting Outliers – Types of Outliers – Detection Methods – Statistical Correlation Functions – QuaT Correlational Method Data Mining for further automation

QA/QC Requirements Detect Abnormal Data & Outliers Correct abnormal data and outliers where it is possible Find additional property/correlation among variables – To catch changes overtime

Detecting Outliers Type of Outliers – Correctable Outliers Caused by calibration, sensor cleaning, low battery voltage, erroneous sensor installation, etc. Outliers caused by these factors can be corrected – Error Values Missing or impossible values caused by sensor failure: physical damage, irreversible factor effects This type of outliers cannot be corrected and must be discarded

Detecting Outliers Detection Methods 1.Normal value range check (Single variable) 2.Diurnal pattern check (Single variable) 3.Correlational pattern check (Multiple variables) 4.Additional methods can be found by data mining

Normal value range check For example, humidity if it is over 100% does not make sense. Also consideration to regional and seasonal factors required. Knowledge Required Known/valid normal value ranges for all variables Also subsets of normal value ranges for all variables in different regions or seasons

Diurnal pattern check The radiation should be high in the day low in the night Knowledge Required Known/valid diurnal pattern Also different diurnal patterns for all variables in different regions or seasons Challenge – How to slice time – What value ranges are considered to be high, average, or low for each variable, simply take standard deviation?

Correlational pattern check For example, the radiation and the temperature should show correlations Knowledge Required Known correlation between the variables How can we verify the correlations? Correlation functions from statistics will be useful Also, a method called QuaT might be useful to analyze the similarity of the trends of two variables along the timeline

Additional Analyses 4.Additional methods might be helpful from data mining – Finding additional correlations – Value range change over time (Global climate change)

Statistic Functions Pearson’s Product Moment Spearman’s Rank Correlation Kendall’s Rank Correlation

Pearson’s Product Moment Pearson’s only works for parametric dataset – Dataset needs to be tested for normality before it can be analyzed – Normality test: Shapiro-Wilk Normality test If a dataset is determined to be non-parametric, either,or both of, Spearman’s or Kendall’s – Also, outliers decreases the precision of Pearson’s

Spearman’s & Kendall Correlation If a dataset is not parametric, these correlation functions can be used Both requires values to be presorted/ranked Spearman’s – compares the distance of the values of the same rank from the two variables Kendall’s – shows the ratio of the values of the same rank from the two variables

QuaT An algorithm to determine the similarity of the two trend curves Introduced by Okabe A. & Masuyama A. of Tokyo University “A robust exploratory method for qualitative trend curve analysis”

QuaT - Basic steps of the algorithm 1.Find peaks and bottoms for the curves that are compared 2.Calculate the height of each peak 3.Determine the distinct height, a threshold height, and extract peaks that are higher or equal to the distinct height. In other words, ignore less distinct peaks 4.Compare extracted peaks and determine if the two variables’ curves have the times of peaks occur at the same time and magnitude (order) for both variable

Basic Relationship among and between the Variables Radiation (short, long, net, PAR) Rainfall (humidity, soil moisture) Temperature (air, surface, body) Wind (speed, direction) AffectingRelatiohshipAffectedSpecific Variable Radiation CategorydirectTemperature Category Radiation CategoryaffectWind Category Radiation CategoryinverseRainfall CategorySoil Moisture Rainfall CategoryinverseRadiation Category Rainfall CategoryinverseTemperature Category Rainfall CategoryaffectWind Category inverseTemperature Category