Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and.

Similar presentations


Presentation on theme: "Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and."— Presentation transcript:

1 Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

2 Beckerman_0611 2 Ostrouchov_SDS_0611  Early computational science relies largely on intuitive design and visual validation  Computational experiments are expensive  Petascale data sets are nearly as opaque as real systems—statistical analysis must select what to visualize  Uncertainty analysis is in its infancy  Statistics is a major partner in bringing computational science to the rigor and efficiency standards of experimental science  Methods to see through, examine, and classify variability  Uncertainty quantification  Statistical design of experiments Common evolutionary steps: Experimental science and computational science

3 Beckerman_0611 3 Ostrouchov_SDS_0611 ORNL statistics and data sciences group:  High-dimensional multivariate classification for chem-bio mass spectrometer  Climate: Compact representation and insight in high-dimensional climate simulation space  Network intrusion detection without content analysis  High-dimensional density space classification  Visualization of high-dimensional relationships  Fusion: Mapping clustered heating coefficient space to tokamak cross-section space Cutting through the fog of high-dimensional variability

4 Beckerman_0611 4 Ostrouchov_SDS_0611  Transform to reduce non-linearity in distribution (often density-based)  PCA computed via SVD (or ICA, FA, etc.)  Construction of component movies  Interpretation of spatial, time, and movie components  Pairs of equal singular values indicate periodic motion  Transform to reduce non-linearity in distribution (often density-based)  PCA computed via SVD (or ICA, FA, etc.)  Construction of component movies  Interpretation of spatial, time, and movie components  Pairs of equal singular values indicate periodic motion Statistical decomposition of time-varying simulation data EETG GTC Simulation Data (W. Lee, Z. Lin, and S. Klasky) Decomposition shows transient wave components in time ƒ (X t ) = S 0 +  1t S 1 + … +  kt S k + E t S1S1  2t 1t1t

5 Beckerman_0611 5 Ostrouchov_SDS_0611 Statistical decomposition of time-varying simulation data  Time dynamics of component pairs  i   j stand out in a small multiples plot  Movies of any component combinations or residuals provide further views of dynamics ¿ ƒ(X t ) = S 0 +  1t S 1 + … +  kt S k + E t

6 Beckerman_0611 6 Ostrouchov_SDS_0611 Conditional small multiples provide access to high-dimensional relationships  Five-dimensional climate relationship involving surface heat exchange  Latent heat flux: x  Radiative heat flux: y  Various precipitation regimes: Rows  Time of day: 6 a.m. left columns, 6 p.m. right columns  Across a century of increasing CO 2 ; century divided into three parts: Columns (1,4), (2,5), (3,6)  Hexagonal binning on a log scale controls overplotting

7 Beckerman_0611 7 Ostrouchov_SDS_0611 RobustMap multiscale data decomposition Hierarchical data decomposition and dimension reduction with O(np) complexity Axis represents difference between data groups Internal nodes contain axes for data splits Leaf nodes contain axes for local dimension reduction n data points in p dimensions C2 b 1 c 1 b k c k. C4 NULL C1 b 1 c 1 C3 b 1 c 1 C5 b 1 c 1 b k c k. Each set of axes represents a local data group

8 Beckerman_0611 8 Ostrouchov_SDS_0611 A statistical framework for guiding visualization of massive data sets Browse a petabyte of data?  To see 1% of a petabyte at 10 MB per second takes 35 work days!  Statistical analysis must select views or reduce to quantities of interest in addition to fast rendering of data More statistical analysis Human bandwidth overload? Visualization scalability through guidance by statistical analysis Petabytes Terabytes Gigabytes Megabytes No analysis More data From local models to annotated data in global context through clustering in a high-dimensional feature space

9 Beckerman_0611 9 Ostrouchov_SDS_0611 Contact George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division (865) 574-3137 ostrouchovg@ornl.gov 9 Ostrouchov_SDS_0611


Download ppt "Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and."

Similar presentations


Ads by Google