Emulator of Cosmological Simulation for Initial Parameters Study

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Zhu, Rogers, Qian, Kalish Presented by Syeda Selina Akter.
Conditional Random Fields
Preliminary Sensitivity Studies With CRASH 3D Bruce Fryxell CRASH Review October 20, 2009.
Today Introduction to MCMC Particle filters and MCMC
Radial-Basis Function Networks
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
In Situ Sampling of a Large-Scale Particle Simulation Jon Woodring Los Alamos National Laboratory DOE CGF
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
 1  Outline  stages and topics in simulation  generation of random variates.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based.
Fusion-SDM (1) Problem description –Each run in future: ¼ Trillion particles, 10 variables, 8 bytes –Each time step, generated every 60 sec is (250x10^^9)x8x10.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
EBEx foregrounds and band optimization Carlo Baccigalupi, Radek Stompor.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
1 Design of experiment for computer simulations Let X = (X 1,…,X p )  R p denote the vector of input values chosen for the computer program Each X j is.
Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Big data classification using neural network
OPERATING SYSTEMS CS 3502 Fall 2017
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
Deep Learning Amin Sobhani.
WAVELET VIDEO PROCESSING TECHNOLOGY
making certain the uncertainties
mps-tk : A C++ toolkit for multiple-point simulation
for the Offline and Computing groups
How Good is a Model? How much information does AIC give us?
So far we have covered … Basic visualization algorithms
Chaoyun Zhang, Xi Ouyang, and Paul Patras
Introduction.
Dynamical Statistical Shape Priors for Level Set Based Tracking
How to handle missing data values
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
K Nearest Neighbor Classification
Homogeneity Guided Probabilistic Data Summaries for Analysis and Visualization of Large-Scale Data Sets Ohio State University (Shen) Problem: Existing.
Multistep Processing of a User Program
Monte Carlo Simulation of Neutrino Mass Measurements
Céline Scheidt, Pejman Tahmasebi and Jef Caers
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Mohamed Dirir, Norma Sinclair, and Erin Strauts
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Outline Ganesan, D., Greenstein, B., Estrin, D., Heidemann, J., and Govindan, R. Multiresolution storage and search in sensor networks. Trans. Storage.
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
George Bebis and Wenjing Li Computer Vision Laboratory
Distributed Algorithms for DCOP: A Graphical-Game-Based Approach
Ensemble learning Reminder - Bagging of Trees Random Forest
Introduction to Sensor Interpretation
Outline for October 9 Input and Output (continued)
Charge to the Validation Review
EM Algorithm and its Applications
Physics-guided machine learning for milling stability:
Introduction to Sensor Interpretation
In Situ Fusion Simulation Particle Data Reduction Through Binning
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Computed Tomography (C.T)
Presentation transcript:

Emulator of Cosmological Simulation for Initial Parameters Study Abstract 1.1 Reconstruction Results of GMM Emulator 2. Statistical-based Super-resolution The cosmological model is developed to simulate the evolution of the universe by given initial conditions of the universe. Each initial condition consists of many initial physical parameters. Physicists usually are interested in three to ten of these initial parameters, which create a huge parameter space. Each initial condition of the universe can generate up to ten quantities of interest, and each quantity can have up to 4096^3 grids. Accessing such massive datasets to perform the parameter study on the post-analysis machine could be slow and hinder the analysis task due to the limited I/O bandwidth of the supercomputer. Also, because of the limited storage of disk, dumping all data from simulation and storing on disks are not feasible. Developing techniques to reduce the data size in-situ, and the requirement of the I/O bandwidth and disk storage is critical for the parameter study in the cosmological simulation. We develop two promising techniques, "GMM-based (Gaussian Mixture model) Emulator" and "Statistical-base Super-resolution", to achieve our goal. GMM-based Emulator uses multi-variate GMM to compactly summarize the data from initial simulation parameters and quantities of interest. The GMM-based emulator can be built incrementally when the data of new initial parameters comes. Statistical-based Super-resolution statistically down-samples the data in-situ and recovers the down-sampled data to full resolution when scientists analyze the data in the post-processing stage. A one-time task which collects full-resolution data generated from a small set of initial conditions creates the prior knowledge for super-resolution. The statistical-based down-sampling reduces the size of data from other initial conditions by using distribution to summarize data values in each local block. The prior knowledge is used to compensate for the lack of spatial information of the statistical-based down-sampled data and recover it to full-resolution. Super-resolution technique in computer vision community uses information from small set of training images to recover other down-sampled images to raw resolution Challenge: data range of different initial simulation parameters could be very different Observation: locations of relatively higher and lower values in a local region are often similar across the data from different initial simulation parameters Compactly summarizing data in each down-sampled blocks by distributions can preserve absolute data values in the block Prior knowledge/dictionary created from the data generated from a few initial simulation parameters compensates the lack of spatial information to recover down-sampled distribution blocks to raw resolution blocks Reconstruction errors become low and stable when we use 5 or more Gaussians for each grid point. Statistical Down-sampling Reconstruction Showcase Density Timestep: 50 Temperature Timestep: 100 z-momentum Timestep: 150 2.1 System Workflow In-situ statistical down-sampling Initial Simulation Parameters ..… Simulate & output full resolution data Disk Dictionary / Prior knowledge High Resolution Data Create prior knowledge from a few parameters simulation run w/ new parameter Low Res. Feature High Res. Block … Raw Reconstruction Raw Reconstruction Raw Reconstruction Size Reduction Gaussian Counts 1 3 5 7 9 Size 1.5% 4.5% 7.5% 10.5% 13.5% Size Reduction 98.5% 95.5% 92.5% 89.5% 86.5% 1.2 Incremental Training of GMM Objectives When doing in-situ data analysis, we train GMM incrementally when the data of new initial parameters come; also, when the data is too large to be held in the memory, we also need to train GMM progressively. The incremental training method we use can keep around 0.64% reconstruction error for density quantity. Reduce cosmological data size in-situ to satisfy the requirement of the I/O bandwidth and disk storage. Reconstruct cosmological data with low error using our emulator. Enhance emulators for in-situ analysis. 2.2 Experiment Results Test on 3 parameters and 3 quantities of interest and 643 resolution data 5 initial simulation parameters are used to generate full-resolution data and generate the prior knowledge The data generated by other initial parameters are in-situ reduced to 2.92% (of full resolution data) statistical-based down-sampled data Reconstructions on different quantities, time steps and initial parameters have slightly different error. Average error of density, temperature and z-mom are 0.6, 2.1 and 0.42%, respectively 6 Gaussians 12 Gaussians 18 Gaussians 1. GMM Emulator GMM-based Emulator uses multi-variate GMM to compactly summarize the data from initial simulation parameters in the joint space of parameters and the quantity of interest. The data of a precise initial condition can be reconstructed from the emulator in the post-analysis stage. Also, the storage size of the GMM emulator is much less than the size of simulation data. The GMM-based emulator can be built incrementally when the data of new initial parameters comes. Considering the memory constraint for the size of GMM emulator, we apply re-sampling and re-training process to reduce the size of trained GMM emulator. Prediction: The GMM emulator can learn data of sampled initial conditions to predict data of untrained initial conditions with low error. 1.3 Incremental Training with Memory Constraint for Model Size The GMM size increases linearly when training incrementally. Memory may not be able to hold the entire model some time. We apply a re-sampling and re-training process to reduce the size of the GMM emulator. Reconstruction errors become larger but acceptable. 5 Gaussians 5 Gaussians 5 Gaussians Density: Raw Temperature: Raw Our reconstruction RMSE: 0.597% Our reconstruction RMSE: 0.706%