Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.

Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan Agrawal The Ohio State University

Ohio State University Department of Computer Science and EngineeringRoadmap Motivation Background System Overview XQuery, Low and High Level schema and Mapping schema Compiler Analysis and Algorithm Parallelization Experiment Summary and Future Work

Ohio State University Department of Computer Science and EngineeringMotivation Declarative and application specific languages –Uses high-level abstractions –Simplifies development of applications Use of restructuring transformations –Difficult due to these abstractions Goal : Apply data-centric transformations –On integer and n on-integer based iteration space while providing high-level abstractions/virtual view of underlying datasets.

Ohio State University Department of Computer Science and EngineeringBackground Data-centric transformation : –Input data is brought into memory/cache in chunks or shackles and then corresponding program fragments or loop iterations requiring access to these data are executed. –Helps in improving data locality. Integer based iteration space –Loop takes integer values with constant step-size between a lower and upper bound. Non-integer based iteration space –Loop takes values from a sequence or set of real numbers, strings, or any other data types. –Easily expressible in declarative languages

Ohio State University Department of Computer Science and Engineering Example: Data-centric transformation for i:= 1 to 3 { Count the number of occurrences of i in a list of digits }

Ohio State University Department of Computer Science and Engineering Na ï ve Strategy DatasetOutput 0 2 2 22 4 1 11 11 5 33 33 33 6 Requires 3 Scans Counter

Ohio State University Department of Computer Science and Engineering Data Centric Strategy DatasetOutput 000 2 2 22 4 1 11 11 5 33 33 33 6 Requires just one scan Counter1 Counter2 Counter3 2111

Ohio State University Department of Computer Science and Engineering Example: Data-centric transformation with non-integer iteration space for each distinct color (green, blue, pink) { with that color }

Ohio State University Department of Computer Science and Engineering Na ï ve Strategy DatasetOutput 000 Requires 3 Scans 555 Counter

Ohio State University Department of Computer Science and Engineering Data Centric Strategy DatasetsOutput 000 Requires just one scan Counter1 Counter2 Counter3 Mapping 5551112

Ohio State University Department of Computer Science and Engineering Previous work and Contributions Related Work –Data-centric multilevel blocking (Pingali et. al., PLDI 1997) –Sparse matrix code synthesis from high-level specifications (Pingali et. al., SC 2000) –Supporting XML Based high-level abstraction on flat-file datasets (LCPC 2003, XIME-P 2004) Contributions of this paper –An improved data- centric transformation algorithm which works on both integer and non-integer based iteration spaces. –Handling of out-of-core computations involving multi-dimensional datasets, without limiting the organization of low-level datasets. –Automatic parallelization of the considered class of application.

Ohio State University Department of Computer Science and Engineering System Overview High level XML Schema Mapping Schema Dataset Compiler Mapping Service System Overview Low level XML Schema Low-level Library Cluster with Disk XQuery Source Code

Ohio State University Department of Computer Science and Engineering XQuery and XML Schemas High-level declarative languages ease application development –XQuery is a high-level language for processing XML datasets –Derived from database, declarative, and functional languages! High-level schema – XML is used to provide a virtual view of the dataset Low-level schema –reflects actual physical layout. Mapping schema: –describes mapping between each element of high-level schema and low-level schema

Ohio State University Department of Computer Science and Engineering Oil Reservoir Simulation Support cost-effective Oil Production Simulations on a 3-D grid 17 variables and cell locations in 3-D grid at each time step Computation of bypassed regions –Expression to determine if a cell is bypassed for a time-step –Within a spatial region and range of time steps –Grid cells that are bypassed for every time-step in the range Oil Reservoir management

Ohio State University Department of Computer Science and Engineering High-Level Schema

Ohio State University Department of Computer Science and Engineering High-Level XQuery Code Of Oil Reservoir management unordered( for $i in ($x1 to $x2) for $j in ($y1 to $y2) for $k in ($z1 to $z2) let $p := document("OilRes.xml")/data where ($p/x=$i) and ($p/y = $j) and ($p/z = $k) and ($p/time >= $tmin) and ($p/time <= $tmax) return {$i, $j, $k} { analyze($p) } )

Ohio State University Department of Computer Science and Engineering Low-Level Schema integer 1 [1] float 1 [x]..............

Ohio State University Department of Computer Science and Engineering Mapping Schema //high/data/velocity //low/info/data/velocity //high/data/time //low/info/data/time //high/data/mom //low/info/data/mom [index(//low/info/data/velocity, 1)] //high/data/x //low/coord/x [index(//low/info/data/velocity, 1)]

Ohio State University Department of Computer Science and Engineering Modified Oil Reservoir management with non-integer iteration space let $src = document(“Oil.xml”)//data/x,y,z Let $coord = distinct-values($src) unordered( for $C in $coord let $p := document("OilRes.xml")/data where ($p/x=$C/x) and ($p/y = $C/y) and ($p/z = $C/z) and ($p/time >= $tmin) and ($p/time <= $tmax) return {$C/x, $C/y, $C/z} { analyze($p) } )

Ohio State University Department of Computer Science and Engineering Basic steps in our Data Centric Transformation algorithm Mapping Function T : Iteration space → High-Level data Mapping Function C : High-Level data → Low-Level data Mapping Function C · T = M : Iteration space → Low-Level data Our Goal is to compute M -1 and use the following steps –Iterate over each data element in actual storage –Find out iterations of the original loop in which they are accessed using M -1. –Access required elements of other datasets. –Execute computation corresponding to those iterations.

Ohio State University Department of Computer Science and Engineering Handling non-integer based iteration space with hash-table Abstract integer iteration space: –Based on the unique sequence number of each element in the actual iteration space. –One-to-one correspondence between actual and abstract iteration space »Hash table can be used to create this mapping »Sequence number in the hash table indicates the iteration instance in abstract iteration space

Ohio State University Department of Computer Science and Engineering Template for Generated Code using hash table Generated_Query { Go through the datasets and create a list of tuples, each denoting an iteration Foreach i in the list of tuples { apply hash function on i If i is not present in hash table, enter i into hash table and store its sequence number and the corresponding output element } For k = 1, …, NO_OF_CHUNKS { Read k th chunk of dataset S 1 using HDF5 functions. Foreach of the other datasets S 2, …, S n access the required chunk of the dataset. Foreach data element in the chunks of data { compute the iteration instance i. apply the hash function and determine the corresponding output element. apply the reduction computation and update the output. } }

Ohio State University Department of Computer Science and Engineering Handling non-integer based iteration space without hash-table Find out the two choices required for construction of actual iteration space Determine the procedure to construct the actual iteration space From High-level schema, select the attributes forming unique set of tuples (V) Consider the set of attributes forming the iteration space as P. If P is not a subset of V, we use hash table. Else if P = V, transformation is done without hash table. Else if P is a proper subset of V, then the choice depends on the presence of duplicate tuples.

Ohio State University Department of Computer Science and EngineeringParallelization Two obvious ways to parallelize –First one is to parallelize the for loop going through different chunks –Second one is to parallelize the for loop going through data in each chunk Choose the method depending on the number of chunks and chunk size. Reduction operation required to combine values from different processors.

Ohio State University Department of Computer Science and Engineering Experimental test bed HDF5 version 1.6.3 ( uses MPI-I/O for parallel I/O ) Sequential experiments - 700 MHz PIII machine,1GB memory, Linux version 2.4.18 Parallel Experiments – Itanium 2 cluster with dual 1.3 Ghz Itanium 2 processor nodes, 4 GB RAM, 80 GB hard drive Four applications –Transaction database analysis –Original Oil reservoir simulation –Modified Oil reservoir simulation –Virtual microscope

Ohio State University Department of Computer Science and Engineering Experimental result Virtual Microscope Oil Reservoir Simulation Modified Oil Reservoir Simulation Transaction database Analysis With DCT without hash table 1.322.642.08- With DCT using hash table --2.977.57 Without DCT 10.6527.1323.6996.11 Execution time (sec.) using different versions of transformation algorithm

Ohio State University Department of Computer Science and Engineering Experimental result

Ohio State University Department of Computer Science and EngineeringSummary Compiler techniques –Perform data centric transformations automatically on integer and non-integer based iteration space. –More efficient method without using has table for data centric transformation on non-integer based iteration space. –Support High-level abstractions on complex low-level data formats. –Parallelization of the considered class of queries. Future Work –Experimental results on more applications. –Compare performance with manual implementations –Formalize the mapping schema. –Extend applicability of the algorithm to more general class of queries.

Ohio State University Department of Computer Science and Engineering Canonical structure of Source code where transformation is legal [ let $SrcList = document("Document Name")//Set Of Variables ] [ let $UniqueList = distinct-values( $SrcList ) ] Unordered ( for $loopindex in $UniqueList { for $loopindex in $UniqueList } | for $loopindex in ($LL to $UL) { for $loopindex in ($LL to $UL) } let $sequence := $document("Document Name")//Set Of Variables where ( Conditional-expr( $sequence/Var, $loopindex) ) { and ( Conditional-expr( $sequence/Var, $loopindex) ) } let $ReturnValue = ReductionComputation ($sequence) [ where Conditional-expr( $ReturnValue ) ] return { expr($loopindex) } { { expr($loopindex) } } { expr($ReturnValue) } )

Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.

Similar presentations

Presentation on theme: "Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.

Similar presentations

Presentation on theme: "Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan."— Presentation transcript:

Similar presentations

About project

Feedback