Constraining the symmetry energy with heavy-ion collisions and Bayesian analysis Chun Yuen Tsang.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Neural Networks and Kernel Methods
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Contrastive Divergence Learning
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
CS479/679 Pattern Recognition Dr. George Bebis
Pattern Recognition and Machine Learning
Bayesian Estimation in MARK
The loss function, the normal equation,
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
x – independent variable (input)
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
PATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Randomized Algorithms for Bayesian Hierarchical Clustering
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
(Day 3).
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
MCMC Output & Metropolis-Hastings Algorithm Part I
Evaluating Classifiers
Deep Feedforward Networks
GEOGG121: Methods Monte Carlo methods, revision
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Learning Recommender Systems with Adaptive Regularization
ERGM conditional form Much easier to calculate delta (change statistics)
Model Inference and Averaging
How Good is a Model? How much information does AIC give us?
Ch8: Nonparametric Methods
Linear Regression (continued)
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Statistical Models for Automatic Speech Recognition
10701 / Machine Learning.
A Simple Artificial Neuron
Probabilistic Models for Linear Regression
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Latent Dirichlet Analysis
Unfolding Problem: A Machine Learning Approach
Hidden Markov Models Part 2: Algorithms
Probabilistic Models with Latent Variables
Ying shen Sse, tongji university Sep. 2016
Multidimensional Integration Part I
Filtering and State Estimation: Basic Concepts
10701 / Machine Learning Today: - Cross validation,
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
The loss function, the normal equation,
Junghoo “John” Cho UCLA
Mathematical Foundations of BME Reza Shadmehr
Parametric Methods Berlin Chen, 2005 References:
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Introduction to Machine learning
Probabilistic Surrogate Models
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Constraining the symmetry energy with heavy-ion collisions and Bayesian analysis Chun Yuen Tsang

Models and Data Analysis Initiative (MADAI) MADAI is a statistical package that contains an Gaussian emulator and a Markov Chain Monte Carlo (MCMC) sampler Gaussian Emulator: a surrogate model. A high dimensional interpolator with error estimates Full Transport model simulations (ImQMD) of heavy ion collisions (e.g. 124Sn+124Sn) takes weeks to calculate. Interpolated from 50 full ImQMD simulations Optimizing 4 model parameters (S0, L, ms, mv) MCMC: generate posterior distribution 𝑃 𝑝𝑜𝑠𝑡 𝑥 𝑖 | 𝑦 𝑒𝑥𝑝 from Bayesian analysis is generated with MCMC algorithm Work flow: Generate 50 data points from ImQMD Use emulator to emulate ImQMD Generate posterior distribution Demonstration of using a Gaussian emulator with 1D data points Chun Yuen Tsang, Novermber ISNET-5 workshop

Gaussian process Gaussian process assume each point from the function is distributed in a multivariate Gaussian distribution 𝑓 𝑓 ∗ ~ 𝒩 0 , 𝐾 𝑋 , 𝑋 𝐾 𝑋 , 𝑋 ∗ 𝐾 𝑋 ∗ , 𝑋 𝐾 𝑋 ∗ , 𝑋 ∗ Where K is the kernel matrix, f is the vector of training input, f* is the vector of training output X* is the location of the training points and X is the location of the output Kernel is written as 𝑘 𝑥 1 , 𝑥 2 = 𝜃 0 exp − 1 2 ∑ 𝑢 𝑖 − 𝑣 𝑖 2 𝜃 2+𝑖 2 Training points are given, so we need conditional probability 𝑓 ∗ | 𝑓 =𝒩 𝐾 𝑋 ∗ , 𝑋 𝐾 𝑋 , 𝑋 −1 𝑓 ,𝐾 𝑋 ∗ , 𝑋 ∗ −𝐾 𝑋 ∗ , 𝑋 𝐾 𝑋 , 𝑋 −1 𝐾 𝑋 , 𝑋 ∗ Notation: 𝑓 = training points 𝑓 ∗ = emulator output 𝑋 = training points parameters 𝑋 ∗ = parameters where predications are made Amplitude Scale Chun Yuen Tsang, Novermber ISNET-5 workshop

Gaussian process (noisy input) Gaussian process assume each point from the function is distributed in a multivariate Gaussian distribution 𝑓 𝑓 ∗ ~ 𝒩 0 , 𝐾 𝑋 , 𝑋 + 𝜃 1 2 𝐼 𝐾 𝑋 , 𝑋 ∗ 𝐾 𝑋 ∗ , 𝑋 𝐾 𝑋 ∗ , 𝑋 ∗ Where K is the kernel matrix, f is the vector of training input, f* is the vector of training output X* is the location of the training points and X is the location of the output Kernel is written as 𝑘 𝑥 1 , 𝑥 2 = 𝜃 0 exp − 1 2 ∑ 𝑢 𝑖 − 𝑣 𝑖 2 𝜃 2+𝑖 2 Training points are given, so we need conditional probability 𝑓 ∗ | 𝑓 =𝒩 𝐾 𝑋 ∗ , 𝑋 [𝐾 𝑋 , 𝑋 + 𝜃 1 2 𝐼] −1 𝑓 ,𝐾 𝑋 ∗ , 𝑋 ∗ −𝐾 𝑋 ∗ , 𝑋 [𝐾 𝑋 , 𝑋 + 𝜃 1 2 𝐼] −1 𝐾 𝑋 , 𝑋 ∗ Nugget Notation: 𝑓 = training points 𝑓 ∗ = emulator output 𝑋 = training points parameters 𝑋 ∗ = parameters where predications are made Amplitude Scale Chun Yuen Tsang, Novermber ISNET-5 workshop

Short summary (so far) Gaussian process assume each point from the function is distributed in a multivariate Gaussian distribution 𝑓 𝑓 ∗ ~ 𝒩 0 , 𝐾 𝑋 , 𝑋 + 𝜃 1 2 𝐼 𝐾 𝑋 , 𝑋 ∗ 𝐾 𝑋 ∗ , 𝑋 𝐾 𝑋 ∗ , 𝑋 ∗ Where K is the kernel matrix, f is the vector of training input, f* is the vector of training output X* is the location of the training points and X is the location of the output Kernel is written as 𝑘 𝑥 1 , 𝑥 2 = 𝜃 0 exp − 1 2 ∑ 𝑢 𝑖 − 𝑣 𝑖 2 𝜃 2+𝑖 2 Training points are given, so we need conditional probability 𝑓 ∗ | 𝑓 =𝒩 𝐾 𝑋 ∗ , 𝑋 [𝐾 𝑋 , 𝑋 + 𝜃 1 2 𝐼] −1 𝑓 ,𝐾 𝑋 ∗ , 𝑋 ∗ −𝐾 𝑋 ∗ , 𝑋 [𝐾 𝑋 , 𝑋 + 𝜃 1 2 𝐼] −1 𝐾 𝑋 , 𝑋 ∗ Nugget 2 hyperparameters scale and nugget will affect how the model is interpolated Main Question: How to decide which values to use? Notation: 𝑓 = training points 𝑓 ∗ = emulator output 𝑋 = training points parameters 𝑋 ∗ = parameters where predications are made Amplitude Scale Chun Yuen Tsang, Novermber ISNET-5 workshop

Correlation with all default values Scale = 0.001, nugget = 0.01 Sn+Sn E=120 MeV n/p and DR data set Model Parameters: S0, L, ms, mv S0 L mv fi S0 L mv fi Chun Yuen Tsang, Novermber ISNET-5 workshop

Cross validation Leave a certain group of calculations out of training set and compare the result from emulator to those leaved out set. Predictive log probability (excluding set v): log 𝑝 𝑓 𝑣 𝑋, 𝑓 −𝑣 ,𝜃) = 𝑖∈𝑣 − 1 2 𝑙𝑜𝑔 𝜎 𝑖 2 − 𝑓 ∗𝑖 − 𝑓 𝑖 2 2 𝜎 𝑖 2 − 1 2 𝑙𝑜𝑔2𝜋 Total predictive probability: 𝐿 𝐶𝑉 𝑋,𝑓,𝜃 = 𝑣 log 𝑝 𝑓 𝑣 𝑋, 𝑓 −𝑣 ,𝜃) Goal: Maximizing LCV Chun Yuen Tsang, Novermber ISNET-5 workshop

Segregate training data for cross validation Testing sets: Sets that are NEVER involved in training i.e. Emulator is oblivious to the testing sets Goal: Test the accuracy of the emulator’s prediction Goal: Test if the emulator overfit the training data Validation sets: will be left out sequentially and compare emulator’s result and validation sets Groups of 5 sets will be left out each time Repeat until every single run in the validation set will be left out at lease once Ask emulator to extrapolate to where the 5 sets are supposed to be Sum up the log likelihood of those 5 sets given emulator output 5 sets of simulations will be taken out Train emulator with the remaining sets Put the taken out sets back in and choose another 5 sets All runs have been left out at least once? No? Output total log likelihood Yes? Chun Yuen Tsang, Novermber ISNET-5 workshop

Segregate training data for cross validation What we have : 49 ImQMD full simulation sets. Segregation of data: Testing sets: set 1 – 5 Validation sets: any 5 sets from set 6 – 49 Log likelihood from all validation sets (set 6 – 49) is shown Predicted highest likelihood is located at: scale = 0.633, nugget = 0.248 Chun Yuen Tsang, Novermber ISNET-5 workshop

Emulator vs left out training points After optimization Scale = 0.633 Nugget = 0.248 Reduced chi-square (Arguably) intercept is closer to 0 and slope is closer to 1 Default value (before optimization) Scale = 0.01 Nugget = 0.001 Chun Yuen Tsang, Novermber ISNET-5 workshop

Log likelihood of testing set Gradient descent into the max. log likelihood with validation sets Plot testing sets log likelihood Caution: testing sets was not involved in the validation sets log likelihood! If overfit, testing sets log likelihood may decrease even when validation sets log likelihood decreases. Chun Yuen Tsang, Novermber ISNET-5 workshop

Log likelihood of testing set Gradient descent into the max. log likelihood with validation sets Plot testing sets log likelihood Caution: testing sets was not involved in the validation sets log likelihood! If overfit, testing sets log likelihood may decrease even when validation sets log likelihood decreases. Chun Yuen Tsang, Novermber ISNET-5 workshop

Log likelihood of testing set Gradient descent into the max. log likelihood with validation sets Plot testing sets log likelihood Caution: testing sets was not involved in the validation sets log likelihood! If overfit, testing sets log likelihood may decrease even when validation sets log likelihood decreases. HOWEVER Chun Yuen Tsang, Novermber ISNET-5 workshop

New Correlations Scale = 0.91, nugget = 0.14 mv fi S0 L mv fi Chun Yuen Tsang, Novermber ISNET-5 workshop

Correlation with all default values Scale = 0.001, nugget = 0.01 Sn+Sn E=120 MeV n/p and DR data set Model Parameters: S0, L, ms, mv S0 L mv fi S0 L mv fi Chun Yuen Tsang, Novermber ISNET-5 workshop

Comparison Scale = 0.001, nuggets = 0.01 Scale = 0.91, nuggets = 0.47 Chun Yuen Tsang, Novermber ISNET-5 workshop

Summary and Outlook Correlation is sensitive to the emulator parameters: scale and nugget Outcome is inconsistent with expectation Further tests on the choice of hyperparameters? Ways to test if the emulator does a good job? Validation with other software? Chun Yuen Tsang, Novermber ISNET-5 workshop

Acknowledgment HiRA group: Corinne Anderson, Jon Barney ,John Bromell, Kyle Brown, Giordano Cerizza, Jacob Crosby, Justin Estee, Genie Jhang, Bill Lynch, Juan Manfredi, Pierre Morfouace, Sean Sweany, Betty Tsang, Tommy C. Y. Tsang, Kuan Zhu Chun Yuen Tsang, Novermber ISNET-5 workshop

Chun Yuen Tsang, Novermber ISNET-5 workshop