Slycat Ensemble Analysis Patricia J. Crossno, Timothy M. Shead, Milosz A. Sielicki, Warren L. Hunt, Shawn Martin, and Ming-Yu Hsieh Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL SAND P Patricia J. Crossno: Timothy M. Shead: Milosz A. Sielicki: Warren L. Hunt: Shawn Martin: Ming-Yu Hsieh: Analysis Tasks: Find strongest input/output correlations Find inputs with least impact on outputs Find anomalous simulation runs CCA Visual Representations Scatterplot: Each Simulation Relative to Ensemble Distance off diagonal shows difference from ensemble as a whole, plus potential anomalies. Purple = Outputs Bar chart: Ensemble-wide Relationships Viewing 1 st CCA component in both views Input x1 has the least impact on outputs y1 and y2 250 simulations, each color-coded by its y1 output value Selected simulation Positive many-to- many correlation (bar color the same) between X25 & X14 and Y2 & Y1 Green = Inputs Inputs x25 & x14 have the most impact on both outputs y1 and y2 Viewing 2 nd CCA component in both bar chart & scatterplot 250 simulations, each color-coded by its x23 input value Inputs and outputs sorted by correlation strength within CCA2 component X23 selected for scatterplot color- coding (dark green row highlight) Three distinct groups of input values Inverse correlation (red vs. blue) between x23 & y4; CCA3 captures relationship between x8 & y3 Scatterplot color- coding changed by clicking on y4 row (darker purple highlight) Three output value groups map to the 3 input groups 250 simulations, each color-coded by its y4 output value Click CCA column header to select CCA component in views Viewing 3 rd CCA component in both bar chart & scatterplot Inverse correlation between x8 & y3; CCA2 captures relationship between x23 & y4 250 simulations, each color-coded by its x8 input value X8 inputs range from low (blue) to high (red) X8 selected for scatterplot color- coding (dark green row highlight) Click header triangle to sort variables (toggles from decreasing to increasing) 250 simulations, each color-coded by its y3 output value Corresponding y3 outputs inversely range from high (red) to low (blue) Scatterplot color- coding changed by clicking on y3 row (darker purple highlight) Approach: Canonical Correlation Analysis (CCA) features simulations outputs inputs s1s1 s2s2 snsn o2o2 i1i1 omom … s3s3 s4s4 ikik o1o CCA features inputs i1i1 ikik outputs o2o2 o1o omom CCA components c1c1 ckck … CCA1 input meta- features output meta- features s1s1 s2s2 snsn s4s4 s3s3 Structure Correlations Slycat Sensitivity Analysis Input parameters SimulationEnsemble Simple Regression (1-to-1) Multiple Regression (Many-to-1) Model Confidence How About Many-to-Many Correlations? Problem: Electrical Circuit Simulation Sensitivity Analysis Rerun CCA analysis between all inputs and y4 to find strongest correlations (all-to-1) All to y4 analysis 4 anomalous runs share common x248 values 2641 simulations, each color- coded by its x248 input value (strongest) All to y4 analysis 4 anomalous runs share common x255 values 2641 simulations, each color-coded by its x255 input value (2 nd strongest) 2641 simulations, each color-coded by its y4 output value 4 anomalous runs in y4 values All to all analysis Finding Anomalous Simulations Finding Most Significant Inputs Objectives: Map Output Variability Back to Inputs Reduce Number of Input Parameters Reduce Number of Simulations to Run Identify Anomalous Runs Increase Model Confidence 266 scrollable Inputs Note R2 is increasing & P is decreasing with each CCA component Available Open Source Reduce Inputs & Simulations In the 2641 run ensemble above, analysis allowed input parameters to be reduced from 266 to 21, decreasing simulation time ten-fold.