Download presentation
Presentation is loading. Please wait.
Published byJadon Cheatwood Modified over 9 years ago
2
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to an explosion of data Simulation and analysis workloads are data-intensive Producing\scanning large amounts of data Management of these data represents a significant challenge Storage\archiving Query processing Visualization
3
Remote Immersive Analysis Formerly, analysis performed during the computation No data stored for subsequent examination Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations Turbulence Database Cluster Stores entire space-time evolution of the simulation Provides public access to world-class simulations Implements “immersive turbulence * ” approach Introduces new challenges * E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.
4
Goals Develop data-driven query processing techniques Reduce I/O and computation costs Reduce or eliminate storage overhead Exploit domain knowledge and structure Provide user interfaces that are efficient and flexible Streamline the process of data ingest
5
Turbulence Database Cluster
6
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1010 101011 1212 1212 1313 1313 1414 1414 1515 1515 Processing a Batch Query 10 11 14 15 8 8 9 9 12 13 2 2 3 3 6 6 7 7 0 0 1 1 4 4 5 5 query 1 query 3 query 2 q1: q2: 9 911 1212 1212 1414 1414 q3: 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 6 6 8 8 9 9 1212 1212 Redundant I/O Multiple disk seeks
7
I/O Streaming Evaluation Method Linear data requirements of the computation allow for: Incremental evaluation Streaming over the data Concurrent evaluation of batch queries
8
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1010 101011 1212 1212 1313 1313 1414 1414 1515 1515 Processing a Batch Query 10 11 14 15 8 8 9 9 12 13 2 2 3 3 6 6 7 7 0 0 1 1 4 4 5 5 query 1 query 3 query 2 11 1414 1414 5 5 7 7 0 0 1 1 2 2 3 3 4 4 6 6 8 8 9 9 1212 1212 q1 q3 q1 q3 q1 q2 q1 q2 I/O Streaming: Sequential I/O Single pass
9
Lagrange Polynomial Interpolation Lagrange coefficients Data
10
Spatial Differentiation
11
Derivative Interpolation
12
128 Workload Over an order of magnitude improvement Sorting leads to a more sequential acces Join/Order By executes entire batch as a join I/O Streaming Each atom is read only once Effective cache usage
13
I/O Streaming alleviates I/O bottleneck Computation emerges as the more costly operation
14
Particle Tracking Web Server/Mediator DB Node 1 Distribute Points based on Computational Module Storage Layer Retrieve DB Node N Computational Module Storage Layer Retrieve x p (t m ) x * p (t m )
15
Particle Tracking Web Server/Mediator DB Node 1 Distribute Points based on Computational Module Storage Layer Retrieve DB Node N Computational Module Storage Layer Retrieve x * p (t m ) x p (t m+1 )
16
Summary and Future Work Extend I/O streaming technique to different decomposable kernel computations: Differentiation Spatial Interpolation Temporal interpolation Filtering and coarse-graining Provide a flexible user interface Allow for different filter functions Allow for new kernel computations Improve particle tracking routine Reduce communication between mediator and DB nodes Asynchronous processing Caching and pre-fetching
17
Questions Images courtesy of Kai Buerger (buerger@tum.de)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.