Etree — A Database-oriented Method for Generating Large Octree Meshes David R. O’Hallaron 1,2 Tiankai Tu 1 Julio C. López 2 1 School of Computer Science.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Advertisements

Access Methods for Advanced Database Applications.
File Systems.
BTrees & Bitmap Indexes
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation.
Database Methods for Scientific Computing David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint work with Tiankai Tu.
Spatial Indexing I Point Access Methods.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Carnegie Mellon Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Joint work with Shimin Chen School of Computer Science Carnegie.
The Etree Library: A System for Manipulating Large Octrees on Disk David R. O’Hallaron Associate Professor of CS and ECE Carnegie Mellon University (joint.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
TRACK-ALIGNED EXTENTS: MATCHING ACCESS PATTERNS TO DISK DRIVE CHARACTERISTICS J. Schindler J.-L.Griffin C. R. Lumb G. R. Ganger Carnegie Mellon University.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Introduction n How to retrieval information? n A simple alternative is to search the whole text sequentially n Another option is to build data structures.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Random-Accessible Compressed Triangle Meshes Sung-eui Yoon Korea Advanced Institute of Sci. and Tech. (KAIST) Peter Lindstrom Lawrence Livermore National.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Page 1 MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services Shoji Nishimura (NEC Service Platforms Labs.), Sudipto Das,
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
File Systems cs550 Operating Systems David Monismith.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2015.
CSE554Contouring IISlide 1 CSE 554 Lecture 3: Contouring II Fall 2011.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2013.
Linear Octree Ref: Tu and O’Hallaron 2004
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
Interactive Terascale Particle Visualization Ellsworth, Green, Moran (NASA Ames Research Center)
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
RE-Tree: An Efficient Index Structure for Regular Expressions
Spatial Online Sampling and Aggregation
Linear Octree Ref: Tu and O’Hallaron 2004
File system : Disk Space Management
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Etree — A Database-oriented Method for Generating Large Octree Meshes David R. O’Hallaron 1,2 Tiankai Tu 1 Julio C. López 2 1 School of Computer Science 2 Electrical and Computer Engineering Department Carnegie Mellon University

2 Motivation Goal : Make it possible to run large-scale physical simulations on PC’s with limited physical memory Mesh generation Mesh t Simulation results Physical model Visuali- zation Solver Approach : Index and store the datasets in databases and compute on the databases directly Requires research at the intersection of computer systems, scientific computing, and databases

3 Octree mesh generation abc defg h ijkl m  a compromise between structure and modeling power  balance requirement (2-to-1 constraint)  can be implemented in-core or out-of-core h1 h2h3h4 element/octant a b c d e f g h i j k l m h2 h1 h4 h3 master node slave node

4 Etree mesh generation overview A general method for manipulating large out- of-core octrees by querying databases unbalanced octree application-specific input construct etree library transform etree library balanced octree balance etree library element database node database

5 Etree library components Application (e.g.construct, balance) Etree library B-Tree Linear quadtree Auto navigation Local balancing Etree API  Etree API — a simple Application Programming Interface  Linear quadtree — an encoding scheme to assign keys to octants  Auto navigation — a mechanism for constructing an octree automatically  Local balancing — a technique that speeds up balancing operation  B-tree — a database index structure to store and access octants on disk

6 Etree API Unix file I/O style, three classes :  initialization and cleanup; example: etree_t *etree_open(const char *path, int flag, …);  octant-level operations; example: int etree_insert(etree_t *ep, location_t loc, void *value);  octree-level operations; example: int etree_balance(etree_t *ep, decom_t *baldecom);

7 Linear quadtree — how it works  Morton code: maps n-dimensional points to one- dimensional scalars  Locational code: appends an octant’s level to the Morton code of its left-lower corner abc defg h ijkl m x y a b c d e f g h i j k l m abcdefghijklm

8 Linear quadtree — how it works (cont’d) interleave the bits to obtain Morton code d’s left-lower corner (2, 2) binary form (010, 010) _11 append the level of d x y a b c d e f g h i j k l m

9 Linear quadtree — applications  an addressing scheme that clusters nearby octants  finding an octant without knowing its locational code  the order imposed by the locational code is the same as the preorder traversal of leafs in octree abc defg h ijkl m a b c d e f g h i j k l x m m

10 Navigation octree  guided by an application function  an in-memory pointer-based octree  dynamically grows in the depth- first order  leaf octants are pruned and flushed to disk in preorder (increasing locational code order)  appends the octant to the database to avoid database search Auto navigation — how it works : octants not yet processed (in memory) : non-leaf octants being decomposed (in memory) : leaf octants (flushed to database)

11 Local balancing — how it works Operational steps 1. partition the whole domain into equal-size blocks 2. conduct internal balancing to enforce 2-to-1 constraint within each block (in memory a resident blocking array) 3.perform boundary balancing to resolve interactions between adjacent blocks

12 Correctness of local balancing Claim: Interactions between adjacent blocks are always absorbed by boundary octants and will not be propagated into the blocks Proof: See the paper

13 Evaluation — questions  Is the etree method feasible?  How does the running time vary with the physical memory size?  What is the impact of auto navigation?  What is the impact of local balancing?

14 Evaluation — methodology  We implemented an etree-based mesh generator to generate a family of finite element meshes for San Fernando valley earthquake wave propagation simulations MeshElements NodesSlave nodes SF107,94012,1184,432 SF576,330105,88634,858 SF21,838,5242,213,035407,336 SF113,579,12415,097,3651,649,855

15 Evaluation — setup  All experiments are conducted on a PIII 1GHz machine running Linux  The machine’s physical memory for the experiments ranges from 128MB to 880MB  Before each experiment, two 1.5 GB files are sequential scanned to ensure that the operating system’s buffer cache is flushed

16 Evaluation — etree feasibility  Generating a mesh with 13.6 million elements and of size 4.3GB in 2.6 hours seems reasonable  The overall throughput increases with mesh size MeshElementsDB size (MB)Time(sec)Thruput(elem/s) SF107, SF576, SF21,838, ,636.71,123 SF113,579,1244,3009,448.81,439 Etree-based mesh generator running time and throughput All experiments are performed with 128MB physical memory

17 Evaluation — impact of memory size  Memory size does not have a significant impact on the running time  The etree method is not relying on the operating systems internal caching mechanism to achieve its performance

18 Evaluation — impact of auto navigation  Reducing B-tree buffer size does not increase the construction time  Auto navigation is not sensitive to B-tree buffer size

19 Evaluation — impact of local balancing  Achieves a speed-up factor ranging from 8 (SF1) to 28 (SF10)  Benefits from the one-time scan of the database and the efficient array-based neighbor finding algorithm

20 Related work  General octree algorithms: Samet 90  Octree mesh: Shepard & Geoges 91, Bern et al. 90, Young et al. 91, Wang99  Out-of-core octree method: Salmon 97  Linear quadtree: Gargantini 82, Morton 66  Space filling curve: Orenstein 84, Orenstein 86, Faloutsos & Roseman 89  Large dataset processing: Freitag & Loy 99, Seamons & Winslett 96, Ferreira et al. 99, Kurc et al. 01, Choudhary et al. 99 Parashar & Browne 97

21 Conclusion and future work  Experiment results suggest that the etree method can generate large octree meshes on memory-limited machines in a reasonable amount of time  Incorporating existing database techniques (linear quadtree and B-tree) with new algorithms (auto navigation and local balancing) in a unified design scheme (the etree) can deliver new capability  We are porting the etree method to commercial database systems such as IBM DB2