Download presentation
Presentation is loading. Please wait.
1
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins University
2
Streaming Evaluation Method Linear data requirements of the computation allow for: – Incremental evaluation – Streaming over the data – Concurrent evaluation of batch queries
3
Motivation Heavy DB usage slows down the service by a factor of 10 to 20 Query evaluation techniques adapted from simulation code do not access data coherently Substantial storage overhead incurred to localize each computation 95% of queries perform Lagrange Polynomial interpolation
4
Turbulence Database Cluster
5
MHD Database Stores velocity, magnetic field, magnetic vector potential and pressure fields – 10 attributes, 4 bytes each – 1024 time-steps over a 1024 3 grid – 40TB total size In order to reduce total amount of I/O: – Smaller atoms (4 3 voxel) – No replication
6
Lagrange Polynomial Interpolation Lagrange coefficients Data
7
Processing a Batch Query
8
Additional Optimizations Process the computation of values that are stored together concurrently Iterate in the appropriate order Compute the Lagrange coefficients with the procedures described by Purser and Leslie* *R. J. Purser and L. M. Leslie. An Efficient Interpolation Procedure for High-Order Three- Dimensional Semi-Lagrangian Models. Monthly Weather Review, 119:2492–+, 1991.
9
Experimental Evaluation Random workloads: – across the entire cube space – a 128 3 subset of the entire space Workload derived from the usage log of the Turbulence Database cluster Compare with: – Direct methods of evaluation
10
Setup Experimental version of the MHD database – ~300 timesteps of the velocity fields of the MHD DNS – Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory – Data tables striped across 7 disks
12
Questions/Comments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.