An Architecture for Large Scale Data Dave Nadeau SDSC Scientific Visualization Group
Motivation CT Cryosection Classification Support analysis, filtering, and compositingSupport analysis, filtering, and compositing –Larger-than-core (and swap) data sets –Multi-modal and time-varying data –Multiple data sets simultaneously And...And... –Do efficient data movement –Execute well on parallel architectures –Integrate easily w/existing applications & toolkits Support Alpha project applicationsSupport Alpha project applications
Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit Layered Toolkit Architecture Manage an N-space data grid Cache pages for lazy I/O Support specific file formats Manage file storage Bind a coord. system to data Orchestrate filter execution
Managing Data Grids Manage a paged data grid (array-like)Manage a paged data grid (array-like) –An N-dimensional grid of cells –Spatial data & time-series –Arbitrary cell data content Handle larger-than-core dataHandle larger-than-core data –Transparently pages data in/out –Support from ADR & DataCutter –Compressed data (disk & memory) Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit Data Grid Toolkit
Random access (slow)Random access (slow) –Get/set cells in any order Structured access (faster)Structured access (faster) –Get/set cells in a pre-defined order Data-order access (fastest)Data-order access (fastest) –Get/set cells in the data’s storage order Pre-fetching Intelligently Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit Data Grid Toolkit
Paging Intelligently Neighborhood-aware pagingNeighborhood-aware paging –Page in nearby cells in N dimensions –Support convolution filtering, rendering, marching-cubes, Current center cell Keep neighboring cells paged-in as well Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit Data Grid Toolkit Filter window
Using Coordinate Systems Bind a coordinate system to a data gridBind a coordinate system to a data grid –Euclidean, cylindrical, spherical, time-series,... –Uniform, structured, unstructured Handle coordinate system-based operationsHandle coordinate system-based operations –Resampling with interpolation –Lazy-evaluation Multiple file format handlersMultiple file format handlers Application Data Grid Toolkit Data Management File Format Handling SRB,ADR, etc. Mesh Toolkit Expression Tree Toolkit Mesh Toolkit
Operating on Data Define an expression tree for data operationsDefine an expression tree for data operations –Leaf nodes are data sets, functions,... –Interior nodes are composite, filter,... –Transforms align overlapping data sets Execute it to generate samplesExecute it to generate samples –Client defines the expression –Server on big iron executes it Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit Client Server
Operating on Expressions Expressions can be optimizedExpressions can be optimized –Re-order operators –Similar to optimizing compilers & databases Sample order can be optimizedSample order can be optimized –Re-order data accesses for better cache efficiency Data can be staged & intermediate results cachedData can be staged & intermediate results cached Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit
Combining Brain Data Sets RGB to HSI Scalar to RGB Mask by Hue Scalar CT-scan Color Cryosection Color Segmentation Extract Hue Composite 512 x 512 x x 710 x 672 Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit
Combining Brain Data Sets CT Cryosection Composited Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit
Combining Stellar Data Sets Complex expression treesComplex expression trees –60+ nodes in the Orion body 90+ separate expression trees90+ separate expression trees –Orion, proplyds, shock fronts,... Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit
And more toolkits... Interactive imaging with...Interactive imaging with... –Mitsubishi VolumePro cards –Point clouds & 3D texture mapping with graphics pipelines High-quality imaging with VISTA...High-quality imaging with VISTA... Application Data Grid Toolkit Data Management File Format Handling SRB, ADR, etc. Mesh Toolkit Expression Tree Toolkit Other Toolkits VolumePro Point Cloud VISTA 3D Texture
Design Team Scripps Research Art Olson Mike Pique Michel Sanner SDSC Bernard Pailthorpe Dave Nadeau Jon Genetti John Moreland Mike Bailey Rich Charles Alex Decastro U. Texas Chandrajit Bajaj Ariel Shamir
Data-Visualization Pipeline Get data from disk efficiently Manage data in memory efficiently Compute on data efficiently Visualize data efficiently ComputationVisualization Data SRB Server MCAT (Metadata) ADR DataCutter SRB Server KeLP FloorPlan Data Data Orchestration...
Data-Visualization Pipeline Get data from disk efficiently Manage data in memory efficiently Compute on data efficiently Visualize data efficiently ComputationVisualization Data Data - Vis Toolkits Interaction Tools VISTA Renderer Data Orchestration...