Download presentation
Presentation is loading. Please wait.
Published byHarvey McCarthy Modified over 9 years ago
1
Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division
2
Beckerman_0611 2 Ostrouchov_SDS_0611 Early computational science relies largely on intuitive design and visual validation Computational experiments are expensive Petascale data sets are nearly as opaque as real systems—statistical analysis must select what to visualize Uncertainty analysis is in its infancy Statistics is a major partner in bringing computational science to the rigor and efficiency standards of experimental science Methods to see through, examine, and classify variability Uncertainty quantification Statistical design of experiments Common evolutionary steps: Experimental science and computational science
3
Beckerman_0611 3 Ostrouchov_SDS_0611 ORNL statistics and data sciences group: High-dimensional multivariate classification for chem-bio mass spectrometer Climate: Compact representation and insight in high-dimensional climate simulation space Network intrusion detection without content analysis High-dimensional density space classification Visualization of high-dimensional relationships Fusion: Mapping clustered heating coefficient space to tokamak cross-section space Cutting through the fog of high-dimensional variability
4
Beckerman_0611 4 Ostrouchov_SDS_0611 Transform to reduce non-linearity in distribution (often density-based) PCA computed via SVD (or ICA, FA, etc.) Construction of component movies Interpretation of spatial, time, and movie components Pairs of equal singular values indicate periodic motion Transform to reduce non-linearity in distribution (often density-based) PCA computed via SVD (or ICA, FA, etc.) Construction of component movies Interpretation of spatial, time, and movie components Pairs of equal singular values indicate periodic motion Statistical decomposition of time-varying simulation data EETG GTC Simulation Data (W. Lee, Z. Lin, and S. Klasky) Decomposition shows transient wave components in time ƒ (X t ) = S 0 + 1t S 1 + … + kt S k + E t S1S1 2t 1t1t
5
Beckerman_0611 5 Ostrouchov_SDS_0611 Statistical decomposition of time-varying simulation data Time dynamics of component pairs i j stand out in a small multiples plot Movies of any component combinations or residuals provide further views of dynamics ¿ ƒ(X t ) = S 0 + 1t S 1 + … + kt S k + E t
6
Beckerman_0611 6 Ostrouchov_SDS_0611 Conditional small multiples provide access to high-dimensional relationships Five-dimensional climate relationship involving surface heat exchange Latent heat flux: x Radiative heat flux: y Various precipitation regimes: Rows Time of day: 6 a.m. left columns, 6 p.m. right columns Across a century of increasing CO 2 ; century divided into three parts: Columns (1,4), (2,5), (3,6) Hexagonal binning on a log scale controls overplotting
7
Beckerman_0611 7 Ostrouchov_SDS_0611 RobustMap multiscale data decomposition Hierarchical data decomposition and dimension reduction with O(np) complexity Axis represents difference between data groups Internal nodes contain axes for data splits Leaf nodes contain axes for local dimension reduction n data points in p dimensions C2 b 1 c 1 b k c k. C4 NULL C1 b 1 c 1 C3 b 1 c 1 C5 b 1 c 1 b k c k. Each set of axes represents a local data group
8
Beckerman_0611 8 Ostrouchov_SDS_0611 A statistical framework for guiding visualization of massive data sets Browse a petabyte of data? To see 1% of a petabyte at 10 MB per second takes 35 work days! Statistical analysis must select views or reduce to quantities of interest in addition to fast rendering of data More statistical analysis Human bandwidth overload? Visualization scalability through guidance by statistical analysis Petabytes Terabytes Gigabytes Megabytes No analysis More data From local models to annotated data in global context through clustering in a high-dimensional feature space
9
Beckerman_0611 9 Ostrouchov_SDS_0611 Contact George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division (865) 574-3137 ostrouchovg@ornl.gov 9 Ostrouchov_SDS_0611
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.