Part II: Model Class Selection Given: Dynamic data from system and set of candidate model classes where each model class defines a set of possible predictive.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
: INTRODUCTION TO Machine Learning Parametric Methods.
Brief introduction on Logistic Regression
Pattern Recognition and Machine Learning
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
Pattern Recognition and Machine Learning
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Lectures on Stochastic System Analysis and Bayesian Updating June 29-July James L. Beck, California Institute of Technology Jianye Ching, National.
Integration of sensory modalities
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Model Assessment, Selection and Averaging
Model Assessment and Selection
Model assessment and cross-validation - overview
Chapter 4: Linear Models for Classification
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ground Motion Intensity Measures for Performance-Based Earthquake Engineering Hemangi Pandit Joel Conte Jon Stewart John Wallace.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Bayesian Analysis of Dose-Response Calibration Curves Bahman Shafii William J. Price Statistical Programs College of Agricultural and Life Sciences University.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Modeling Your Spiking Data with Generalized Linear Models.
CABLE-STAYED BRIDGE SEISMIC ANALYSIS USING ARTIFICIAL ACCELEROGRAMS
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Soft Sensor for Faulty Measurements Detection and Reconstruction in Urban Traffic Department of Adaptive systems, Institute of Information Theory and Automation,
Naive Bayes Classifier
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Bayesian evidence for visualizing model selection uncertainty Gordon L. Kindlmann
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
INTRODUCTION TO Machine Learning 3rd Edition
Experiments on Noise CharacterizationRoma, March 10,1999Andrea Viceré Experiments on Noise Analysis l Need of noise characterization for  Monitoring the.
JHU Workshop: November 2003 Elements of Predictability Robust System Analysis, Model Updating and Model Class Selection James L. Beck Applied Mechanics.
Lecture 2: Statistical learning primer for biologists
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
* 김동현 : KAIST 토목공학과, 박사후연구원 오주원 : 한남대학교 토목환경공학과, 교수 오주원 : 한남대학교 토목환경공학과, 교수 이규원 : 전북대학교 토목환경공학과, 교수 이규원 : 전북대학교 토목환경공학과, 교수 이인원 : KAIST 토목공학과, 교수 이인원 :
Machine Learning 5. Parametric Methods.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
3. Linear Models for Regression 後半 東京大学大学院 学際情報学府 中川研究室 星野 綾子.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Probability Theory and Parameter Estimation I
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
Department of Civil and Environmental Engineering
Machine learning, pattern recognition and statistical data modelling
VIBRATION CONTROL OF STRUCTURE USING CMAC
Special Topics In Scientific Computing
Statistical Learning Dong Liu Dept. EEIS, USTC.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Integration of sensory modalities
Cross-validation for the selection of statistical models
Bayesian inference J. Daunizeau
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Part II: Model Class Selection Given: Dynamic data from system and set of candidate model classes where each model class defines a set of possible predictive models for system: Find: Most plausible model class in set Goal: Selection of level of model c omplexity

Model Class Selection Most Plausible Model Class Maximize: Higher level of robustness: Can include predictions of all model classes but may be unnecessary over all j Dynamic data Prior info

Model Class Selection Evaluation of Model Class Probability Use Bayes Theorem: EvidencePrior Norm. const.

Evaluation of Evidence Total Probability Theorem: Asymptotic expansion about MPV

Model Class Selection using Evidence = log likelihood + log Ockham factor (Gull,’88) = Data fit + Bias against parameterization Assume all model classes equally plausible a priori, then we may rank plausibility of each model class by its log evidence:

Bias Against Parameterization Log Ockham Factor for : For a large number N of data points in D, where are the prior and principal posterior variances for and are the prior and posterior most probable values of So Log Ockham factor decreases with number of model parameters (for large )

Interpretation using information theory For large amount of data N ( Beck and Yuen, J. Engng Mech., Feb ) Log Ockham factor = - (Kullback) information about in D This gives: Log evidence = Data fit - information about in D

Comparison with AIC and BIC Bayesian model class selection criterion Maximize w.r.t. : log evidence = log likelihood + log Ockham factor i.e. ln p Akaike (1974) Maximize: AIC = Akaike (1976), Schwarz (1978) Maximize: BIC = /2 lnN (agrees with proposed criterion for large N except for terms of O(1) )

Evaluation of Likelihood Details for dynamical models with input- output measurements: e.g. Beck & Katafygiotis: “Updating models and their uncertainties. I: Bayesian statistical framework”, J. Engng Mech., April Details for output-only measurements: e.g. Yuen and Beck: “Updating properties of nonlinear dynamical systems with uncertain input”, J. Engng Mech., Jan

Example 1: SDOF Hysteretic Oscillator = bilinear hysteretic restoring force = scaled 1940 El Centro earthquake record Simulated noise (5% of rms simulated displacement) Prediction error modeled as zero-mean Gaussian discrete white noise with variance i.e. predicted displacement at time step n,

Hysteretic force-displacement behavior

Example 1: Choice of Model Classes Model Class 1 (M parameters) Linear oscillators with damping coefficient c>0, stiffness k 1 > 0 and prediction-error variance Model Class 2 (M parameters) Elasto-plastic oscillators (i.e. k 2 = 0 ) with stiffness k 1 > 0, yield displacement and prediction-error variance Independent uniform prior distributions on all parameters

Example 1: Conclusions Class of linear models (M 1 ) much more probable than elasto-plastic models (M 2 ) for lower level excitation, but other way around for higher levels Illustrates an important point: there is no exact class of models for a real system and the most probable class may depend on the excitation level.

Example 2: Modal Model for 10-Story Linear Shear Building Examine most plausible number of modes based on measured accelerations at the roof during base excitation Excitation not measured; modeled as stationary Gaussian white noise with uncertain spectral intensity Other model parameters: Modal frequencies, modal damping ratios and prediction-error variance

Example 2: Evidence for Model Classes Number of modes | Log likelihood | Log Ockham factor | Log evidence

Example 2: Frequency Response Fit for Most Probable 6-mode Model

Concluding Remarks The Bayesian probabilistic approach for model class selection is generally applicable; illustrated here for linear & non-linear dynamical systems with input-output or output-only dynamic data The most plausible class of models is the one with the maximum probability (or evidence) based on the data