Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu.

Similar presentations


Presentation on theme: "Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu."— Presentation transcript:

1 Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu and Julio Lopez)

2 The Scientific Computing Process Mesh t Simulation results Physical model Mesh generation Visuali- zation Solver

3 The Euclid Project Goal: Run large-scale physical simulations on PC’s with limited physical memory. Approach: Index and store the input and output datasets in databases, and compute on the databases directly. Requires research at the intersection of scientific computing, algorithms, databases, and systems. Mesh generation Mesh DBs t Simulation results DB Physical model DB Visuali- zation Solver

4 David O’Hallaron, Jacobo Bielak, Omar Ghattas (Carnegie Mellon) Jonathan Shewchuk (UC Berkeley) Steven Day (SD State)

5 Teora, Italy 1980

6 San Fernando Valley x epicenter lat. 34.32 long. -118.48 lat. 34.08 long. -118.75 lat. 34.38 long. -118.16 San Fernando Valley

7 San Fernando Valley (Top View) Soft soil Hard rock x epicenter

8 San Fernando Valley (Side View) Soft soil Hard rock

9 Node Distribution

10 Partitioned Unstructured Mesh element nodes

11 Simulation and Visualization

12 Scientific Computing with Euclid Represent physical model, mesh, and simulation results on disk in spatial database structures called etrees (Euclid trees) –Linear octree indexed by standard Morton-based locational codes. –Disk pages indexed by standard B-tree indexing structure. Perform entire process out-of-core by querying and updating the etrees. Mesh generation Mesh node and element etrees t Simulation results etree Physical model etree Visuali- zation Solver

13 Octree mesh generation abc defg h ijkl m Balance requirement for meshes (2-to-1 constraint) h1 h2h3h4 element/octant a b c d e f g h i j k l m h2 h1 h4 h3 master node slave node Octrees

14 abc defg h ijkl m x y a b c d e f g h i j k l m 324785601 0 1 2 3 4 5 6 7 8 Linear Octrees abcdefghijkl m B-tree index B-tree Pages

15 010 00 11 00 010 Interleave the bits to obtain Morton code d’s left-lower corner (2, 2) Binary form (010, 010) 001100_11 Append level of d to obtain locational code x a b c d e f g h i j k l m 324785601 0 1 2 3 4 5 6 7 8 Morton code: Maps n-dimensional points to one-dimensional scalars Locational code: Appends an octant’s level to the Morton code of its left-lower corner Addressing Linear Octree Elements y

16 An addressing scheme that clusters nearby octants Finding an octant without knowing its locational code The order imposed by the locational code is the same as the preorder traversal of leafs in octree abc defg h ijkl m a b c d e f g h i j k l x m m Nice Properties of Linear Octrees

17 unbalanced octree Application-specific input construct etree library transform etree library balanced octree balance etree library element database node database Etree Mesh Generator

18 Application (e.g., construct, balance) Application (e.g., construct, balance) Etree Library B-Tree Linear Octree Linear Octree Auto Navigation Auto Navigation Local Balancing Local Balancing Etree API  Etree API — Octant (insert) and octree (balance) level operations.  Linear octree — Well-known coding scheme to assign keys to octants.  Auto navigation — New algorithm for constructing octree automatically.  Local balancing — New algorithm to speed up balancing operation.  B-tree — Well-known DB indexing structure. Etree Library: A Framework In C for Manipulating Etrees on Disk

19 Mesh Element Etree root 00 011011 00 011011 BCDE AFG 00 011011 00 011011 00 011011 0000_01 A0100_10 B0101_10 C0110_10 D0111_10 E1000_01 F1100_01 G X:0101_10 exact hit Y:1010_10 aggregate hit KEY FACT: Leaf nodes and aggregated nodes can be located within a B-tree page with a fast binary search, without traversing the edges of the octree. B-tree page (locational code keys)

20 Mesh Node Etree 000000 a 000100 b 000101 c 000110 d 000111 e 001000 f 001100 g 001101 h 010000 i 010010 j 011000 k 100000 l 100100 m 110000 n B-tree leaf page 1 (Morton code keys) B-tree leaf page 2 (Morton code keys) a(0,0)f(2,0)l(4,0) m(4,2) n(4,4) c(0,3) g(2,2) d(1,2) h(2,3) e(1,3) k(2,4)j(1,4) i(0,4) b(0,2)

21 Navigation octree  Guided by an application function  An in-memory pointer-based octree  Dynamically grows in depth-first fashion  Leaf octants are pruned and flushed to disk in preorder (in increasing locational code order)  Appends the octants to the etree database to avoid database search : Octants not yet processed (in memory) : Non-leaf octants being decomposed (in memory) : Leaf octants (flushed to database) Auto Navigation

22 Operational steps 1.Partition the entire domain into equal-size blocks 2.Perform internal balancing to enforce 2-to-1 constraint within each block (in a memory resident blocking array) 3.Perform boundary balancing to resolve interactions between adjacent blocks Local Balancing Key Fact: Interactions between adjacent blocks are always absorbed by boundary octants and will not be propagated into the blocks.

23 Is etree mesh generation feasible? How does running time vary with the physical memory size? What is the performance impact of auto navigation? What is the performance impact of local balancing? Some Evaluation Questions

24 Used etree mesh generator to build family of finite element meshes for San Fernando Valley earthquake ground motion simulations. MeshElements NodesSlave nodes SF107,94012,1184,432 SF576,330105,88634,858 SF21,838,5242,213,035407,336 SF113,579,12415,097,3651,649,855 Evaluation Methodology SFx : A mesh of the 50 km x 50 km x 12 km San Fernando Valley that resolves seismic waves with periods of at most x seconds.

25 All experiments conducted on a PIII 1GHz machine running Linux 2.4.17. Machine’s physical memory for the experiments ranged from 128 MB to 880 MB. Before each experiment, two 1.5 GB files were sequentially scanned to ensure that the operating system’s buffer cache was flushed. Evaluation Setup

26 MeshElementsDB size (MB)Time (sec)Thruput (elem/s) SF107,9402.540199 SF576,33024186410 SF21,838,5245831,6371,123 SF113,579,1244,3009,4491,439 All experiments performed with 128 MB physical memory Etree Feasibility – Generating a mesh with 13.6 million elements and of size 4.3 GB in 2.6 hours seems reasonable – The overall throughput increases with mesh size

27 – Memory size does not have a significant impact on the running time – The etree method is not relying on the operating system’s internal caching mechanism to achieve its performance Impact of Physical Memory Size

28 – Reducing B-tree buffer size does not increase the construction time – Auto navigation is not sensitive to B-tree buffer size Impact of Auto Navigation

29 – Achieves speedups ranging from 8 (SF1) to 28 (SF10) – Benefits from the one-time scan of the database and the efficient array-based neighbor finding algorithm Impact of Local Balancing

30 General octree algorithms: Samet 90 Octree mesh: Shepard & Geoges 91, Bern et al. 90, Young et al. 91, Wang99 Out-of-core octree solver method: Salmon 97 Linear quadtree: Gargantini 82, Morton 66 Space filling curve: Orenstein 84, Orenstein 86, Faloutsos & Roseman 89 Large dataset processing: Freitag & Loy 99, Seamons & Winslett 96, Ferreira et al. 99, Kurc et al. 01, Choudhary et al. 99, Parashar & Browne 97 Some Related Work

31 Summary and Conclusions Euclid project aims to recast entire scientific computing process in terms of database ops. Incorporating existing database techniques (linear octree and B-tree) with new algorithms (auto navigation and local balancing) in a unified framework (the etree) can deliver new capabilities. On the horizon: –Caching and prefetching for etree solver –Remote access and derived value caching for visualization –Parallell visualization system based on etrees –Unstructured tetrahedral mesh generation using R-trees.

32 Unix file I/O style, three levels of abstraction: Initialization and cleanup. e.g., etree_t *etree_open(const char *path, int flag, …); Octant-level operations. e.g., int etree_insert(etree_t *ep, location_t loc, void* value); Octree-level operations. e.g., i nt etree_balance(etree_t *ep, decom_t *baldecom); Etree API


Download ppt "Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu."

Similar presentations


Ads by Google