CEE 6410 Water Resources Systems Analysis

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Computer vision: models, learning and inference
What is Statistical Modeling
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Biointelligence Laboratory, Seoul National University
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Computacion Inteligente Least-Square Methods for System Identification.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CS 9633 Machine Learning Support Vector Machines
Chapter 7. Classification and Prediction
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
Sparse Kernel Machines
Ch3: Model Building through Regression
CSE 4705 Artificial Intelligence
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Computer vision: models, learning and inference
Machine learning, pattern recognition and statistical data modelling
Machine Learning Basics
Roberto Battiti, Mauro Brunato
CSCI 5822 Probabilistic Models of Human and Machine Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Introduction to Radial Basis Function Networks
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Sanguthevar Rajasekaran University of Connecticut
Support Vector Machines 2
Presentation transcript:

CEE 6410 Water Resources Systems Analysis Data-Driven Modeling and Machine Learning Regression Approach in Water Resource Systems CEE 6410 Water Resources Systems Analysis

Data-driven Models

Data-driven Models Find relationships between the system state variables without explicit knowledge of the physical behavior of the system. Examples: The unit hydrograph method, statistical models (ARMA, ARIMA) and machine learning (ML) models.

Why data-driven modeling and machine learning in water resource systems?

Why data-driven modeling and machine learning in water resource systems? Some highly complex processes in water resources system are difficult to understand and simulate using physically based approach. The Lower Sevier River Basin System Utah

Why data-driven modeling and machine learning in water resource systems? Physically based modeling is limited by the lack of required data and the expense of data acquisition. Data-driven models (Machine Learning) as an alternative. Machine Learning models replicate the expected response of a system.

Example of ML Uses

Supervised vs. Unsupervised learning Supervised learning: relate attributes with a target by discovering patterns in the data. These patterns are used to predict values of the target in future data. Unsupervised learning: The data have no target attribute. Explore the data to find some intrinsic structures in them.

Supervised vs. Unsupervised learning

Procedure Objective Data Retrieval & Analysis Input – Output Selection Learning Machine Calibration Comparison & Robustness Analysis 12

Analysis – Supervised Learning Machine Learning Approach: Input inclusion (Curse of Dimensionality) Generalization (Overfitting) Impact of unseen data (Robustness) Performance comparison (vs. another similar algorithm)

Analysis - Regression Nash coefficient of efficiency (η or E) Similar to Coef of Determination(r2) Range –Inf to 1 Non dimensional units Root mean square error (RMSE): Same units as model response where: t : observed output t* : predicted output tav : observed average output N : number of observations.

Analysis - Classification Confusion matrix Helps evaluate classifier performance on class by class basis Kappa Coefficient: robust measurement of classification accuracy n = number of classes xii =No. of observations on the diagonal of the confusion matrix corresponding to row i and column i, xi+ and x+i = Marginal totals of row i and col. i respectively N = No. of instances.

A Neural Network Model: Bayesian Multilayer Perceptron for Regression & Classification 16

Bayesian Multilayer Perceptron (MLP) ANN algorithm that uses the Bayesian Inference Method (b): Where: y1, y2, …, yn = simultaneous results from algorithm, WI, WII, bI, bII = model weights and biases, [x] = inputs. MLP: Used also with success in simulation and forecasting of soil moisture, reservoir management, groundwater conditions, etc. Probabilistic approach, noise effect minimization, error prediction bars, etc. (b) Implemented by Nabney (2005)

Bayesian Multilayer Perceptron (BMLP) Using a dataset D = [x(n) , t(n)] with n =1…N, the training of the parameters [Wa, Wb, b(n), bh] is performed by minimizing the Overall Error Function E (Bishop, 2007): Where: ED: data error function, EW: penalization term, W= number of weights and biases in the BMLP, and α and β: Bayesian hyper-parameters.

Bayesian Multilayer Perceptron (BMLP) For regression tasks, the Bayesian Inference allows the prediction y(n) and the variance of the predictions σy2, once the distribution of W has been estimated by maximizing the likelihood for α and β (Bishop, 2007). The output variance has two sources; the first source arises from the intrinsic noise in the output values ; and the second source comes from the posterior distribution of the BMLP weights. The output standard deviation vector σy can be interpreted as the error bar for confidence interval estimation (Bishop, 2007).

Bayesian Multilayer Perceptron (BMLP) For classification tasks, the Bayesian Inference method allows for the estimation of the likelihood of a given class of the input variables using a logistic sigmoid function (Nabney, 2002).

BMLP classification - example

BMLP regression - example

Relevance Vector Machine for Regression “Entities should not be multiplied unnecessarily” William of Ockham “Models should be no more complex than is sufficient to explain the data” Michael E. Tipping

Relevance Vector Machine for Regression Developed by Tipping [2001] Given a training set of input-target pairs where N is the number of observations The target vector can be written as: w is a weight matrix Ф is a “design” matrix related with a kernel K(x1,xM) the error ε is assumed to be zero-mean Gaussian, with variance σ2

A likelihood distribution of the complete data set : There is a danger that the maximum likelihood estimation of w and σ2 will suffer from severe over-fitting Imposition of an additional penalty term to the likelihood : This prior is ultimately responsible for the sparsity properties of the RVM

The posterior parameter distribution conditioned on the data : The posterior probability assigned to values which are both probable under the prior and “which explain the data” (Tipping 2004)

The optimal values for many parameters are infinitive, therefore, the posterior probabilities of the associated weights are zero and the corresponding inputs are “irrelevant”. The non-zero elements are “The Relevance Vectors”

RVM approximations to "sinc" function

Generalization and Robustness Analysis Overfitting Calibrate the ML models with one training data set and evaluated their performance with a different unseen test data set. ML applications : ill-posed problems t= f(x) small variation in x may cause large changes in t

Model Robustness Bootstrap Method It is created by randomly sampling with replacement from the whole training data set A robust model is the one that shows a narrow confidence bounds in the bootstrap histogram: Model A Model B Model B is more robust