Bangladesh Short term Discharge Forecasting

Slides:



Advertisements
Similar presentations
: INTRODUCTION TO Machine Learning Parametric Methods.
Advertisements

DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Brief introduction on Logistic Regression
Pattern Recognition and Machine Learning
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Day 6 Model Selection and Multimodel Inference
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material.
Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model.
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by means of inference statistical methods.
Model Assessment and Selection
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.
Non-Seasonal Box-Jenkins Models
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Prediction and model selection
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Financial Econometrics
Hydrologic Statistics
BOX JENKINS METHODOLOGY
Traffic modeling and Prediction ----Linear Models
AR- MA- och ARMA-.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
John G. Zhang, Ph.D. Harper College
Bangladesh Short term Discharge Forecasting time series forecasting Tom Hopson A project supported by USAID.
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Machine Learning 5. Parametric Methods.
The Box-Jenkins (ARIMA) Methodology
Information criteria What function fits best? The more free parameters a model has the higher will be R 2. The more parsimonious a model is the lesser.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
CEE 6410 Water Resources Systems Analysis
Deep Feedforward Networks
Robert Plant != Richard Plant
Lecture 4 Model Selection and Multimodel Inference
12. Principles of Parameter Estimation
Two-way ANOVA with significant interactions
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch8 Time Series Modeling
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistics in MSmcDESPOT
Precipitation Products Statistical Techniques
Data Mining Lecture 11.
Applied Econometric Time-Series Data Analysis
CHAPTER 16 ECONOMIC FORECASTING Damodar Gujarati
Linear Regression.
Model Comparison.
A Weighted Moving Average Process for Forecasting “Economics and Environment” By Chris P. Tsokos.
10701 / Machine Learning Today: - Cross validation,
Linear Model Selection and regularization
Cross-validation for the selection of statistical models
Pattern Recognition and Machine Learning
Stock Prediction with ARIMA
Lecture 4 Model Selection and Multimodel Inference
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
Lecture 4 Model Selection and Multimodel Inference
Time Series introduction in R - Iñaki Puigdollers
BOX JENKINS (ARIMA) METHODOLOGY
Presentation transcript:

Bangladesh Short term Discharge Forecasting time series forecasting Tom Hopson A project supported by USAID

Forecasting Probabilities Rainfall Probability Discharge Probability Rainfall [mm] Discharge [m^3/s] Above danger level probablity 36% Greater than climatological seasonal risk?

Data-Based Modeling Linear Transfer Function Approach (=> Used for the lumped model) Mass Balance Combine to get Q=S/T S dS/dt=u-Q TdQ/dt=u-Q For a catchment composed of linear stores in series and in parallel (using finite differences) Qt=a1ut-1+a2ut-2+…+amut-m+b1Qt-1+b2Qt-2+…+bnQt-n where u is effective catchment-averaged rainfall Derived from non-linear rainfall filter ut=(Qt)c Rt Reference: Beven, 2000

Linear Transfer Function Approach (cont) or for a 3-day forecast, say: Qt+3=a1ut+2+a2ut+1+…+amut-m+b1Qt-1+b2Qt-2+…+bnQt-n Our approach: for each day and forecast, use the AIC (Akaike information criterion) to optimize a’s, m, b’s, n, c, and precip smoothing Residuals (model biases) are then corrected using an ARMA (auto-regressive moving average) model => Something available in R

Autoregressive integrated moving average (ARIMA) in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalisation of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series. They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied to remove the non-stationarity. The model is generally referred to as an ARIMA(p,d,q) model where p, d, and q are integers greater than or equal to zero and refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the Box-Jenkins approach to time-series modelling. Reference: The Analysis of Time Series: An introduction. Texts in Statistical Science. Chatfield. 1996

Semi-distributed Model -- 2-layer model for soil moisture states S1, S2 + Parameters to be estimated from FAO soil map of the world Solved with a 6hr time-step (for daily 0Z discharge) using 4th-order Runge-Kutta semi-implicit scheme t_s1, tp, t_s2 time constants; r_s1, r_s2 reservoir depths

Model selection -- Akaike information criterion Akaike's information criterion, developed by Hirotsugu Akaike under the name of "an information criterion" (AIC) in 1971 and proposed in Akaike (1974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the concept of entropy, in effect offering a relative measure of the information lost when a given model is used to describe reality and can be said to describe the tradeoff between bias and variance in model construction, or loosely speaking that of precision and complexity of the model. The AIC is not a test on the model in the sense of hypothesis testing, rather it is a tool for model selection. Given a data set, several competing models may be ranked according to their AIC, with the one having the lowest AIC being the best. From the AIC value one may infer that e.g the top three models are in a tie and the rest are far worse, but one should not assign a value above which a given model is 'rejected'

Model selection -- AIC = 2k – 2 ln(L) Akaike information criterion AIC = 2k – 2 ln(L) k = # model parameters; L = maximum likelihood estimator (e.g. square error sum) Bayesian information criterion BIC = ln(n) k - 2 ln(L) n = # of data points BIC penalty function is more demanding than AIC

Model selection -- Cross-validation -- most robust (secure), but most computationally-demanding! -- Set aside part of data for testing, ‘train’ on other part; best to cycle through to use all data for testing. e.g. If divide in halves (minimum), then 2X the computations required!