Compiler (and Runtime) Support for CyberInfrastructure Gagan Agrawal (joint work with Wei Du, Xiaogang Li, Ruoming Jin, Li Weng)
What is CyberInfrastructure ? How computing is done is changing with advances in internet and emergence of web Access web-pages, data, web-services from the internet What does it mean in terms of large scale computing Supercomputers are no longer stand-alone resources Large data repositories are common
What is CyberInfrastructure ? Infrastructures we are familiar with Transportation infrastructure Telecommunication infrastructure Power supply/distribution infrastructure CyberInfrastructure means large scale computing infrastructure on the internet Enable sharing of resources Enable large-scale web-services Access and process a 1 tera-byte file as a web-service Run a job on a large supercomputer using your web-browser !
CyberInfrastructure CyberInfrastructure is also a new division with CISE directorate of National Science Foundation Shows the importance Needs new research at all levels Networking / parallel computing hardware System software Applications
Why is Compiler Support Needed for Cynerinfrastructure ? Compilers have often simplified application development Application development for Cyberinfrastructure is a hard problem !! We need transparence to different resources We need transparence to different dataset sources and formats We need applications to adapt to resource availability ….
Outline Compiler supported Coarse-grained pipelined parallelism Why ? How ? XML Based front-ends to scientific datasets Compiler support for application self- adaptation A SQL front-end to a grid data management system
General Motivation Language and Compiler Support for Parallelism of many forms has been explored Shared memory parallelism Instruction-level parallelism Distributed memory parallelism Multithreaded execution Application and technology trends are making another form of parallelism desirable and feasible Coarse-Grained Pipelined Parallelism
Coarse-Grained Pipelined Parallelism (CGPP) Definition Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units Example — K-nearest Neighbor Given a 3-D range R=, and a point = (a, b, c). We want to find the nearest K neighbors of within R. Range_queryFind the K-nearest neighbors
Coarse-Grained Pipelined Parallelism is Desirable & Feasible Application scenarios Internet data
Coarse-Grained Pipelined Parallelism is Desirable & Feasible A new class of data-intensive applications Scientific data analysis data mining data visualization image analysis Two direct ways to implement such applications Downloading all the data to user’s machine – often not feasible Computing at the data repository - usually too slow
Our belief A coarse-grained pipelined execution model is a good match Internet data Coarse-Grained Pipelined Parallelism is Desirable & Feasible
Coarse-Grained Pipelined Parallelism needs Compiler Support Computation needs to be decomposed into stages Decomposition decisions are dependent on execution environment How many computing sites available How many available computing cycles on each site What are the available communication links What’s the bandwidth of each link Code for each stage follows the same processing pattern, so it can be generated by compiler Shared or distributed memory parallelism needs to be exploited High-level language and compiler support are necessary
An Entire Picture Java Dialect Compiler Support DataCutter Runtime System Decomposition Code Generation
Language Dialect Goal to give compiler information about independent collections of objects, parallel loops and reduction operations, pipelined parallelism Extensions of Java Pipelined_loop Domain & Rectdomain Foreach loop reduction variables
ISO-Surface Extraction Example Code public class isosurface { public static void main(String arg[]) { float iso_value; RectDomain CubeRange = [min:max]; CUBE[1d] InputData = new CUBE[CubeRange]; Point p, b; RectDomain PacketRange = [1:runtime_def_num_packets ]; RectDomain EachRange = [1:(max-min)/runtime_define_num_packets]; Pipelined_loop (b in PacketRange) { Foreach (p in EachRange) { InputData[p].ISO_SurfaceTriangles(iso_value,…); } … … }} For (int i=min; i++; i<max-1) { // operate on InputData[i] } Pipelined_loop (b in PacketRange) Pipelined_loop (b in PacketRange) { 0. foreach ( …) { … } 1. foreach ( …) { … } … … … … n-1. S; } Merge Merge RectDomain PacketRange = [1:4];
Experimental Results Versions Default version Site hosting the data only reads and transmits data, no processing at all User’s desktop only views the results, no processing at all All the work are done by the compute nodes Compiler-generated version Intelligent decomposition is done by the compiler More computations are performed on the end nodes to reduce the communication volume Manual version Hand-written DataCutter filters with similar decomposition as the compiler-generated version Computing nodes workload heavy Communication volume high workload balanced between each node Communication volume reduced
Experimental Results: ISO-Surface Rendering (Z-Buffer Based) Width of pipeline Small dataset 150M Large dataset 600M Speedup Speedup % improvement over default version
Outline Compiler supported Coarse-grained pipelined parallelism Why ? How ? XML Based front-ends to scientific datasets Compiler support for application self- adaptation A SQL front-end to a grid data management system
Motivation The need Analysis of datasets is becoming crucial for scientific advances Emergence of X-Informatics Complex data formats complicate processing Need for applications that are easily portable – compatibility with web/grid services The opportunity The emergence of XML and related technologies developed by W3C XML is already extensively used as part of Grid/Distributed Computing Can XML help in scientific data processing?
The Big Picture TEXT … NetCDF RMDB HDF5 XML XQuer y ???
Programming/Query Language High-level declarative languages ease application development Popularity of Matlab for scientific computations New challenges in compiling them for efficient execution XQuery is a high-level language for processing XML datasets Derived from database, declarative, and functional languages ! XPath (a subset of XQuery) embedded in an imperative language is another option
Approach / Contributions Use of XML Schemas to provide high-level abstractions on complex datasets Using XQuery with these Schemas to specify processing Issues in Translation High-level to low-level code Data-centric transformations for locality in low-level codes Issues specific to XQuery Recognizing recursive reductions Type inferencing and translation
External Schema XQuery Sources Compiler XML Mapping Service System Architecture logical XML schemaphysical XML schema C++/C
Satellite Data Processing Time[t] ··· Data collected by satellites is a collection of chunks, each of which captures an irregular section of earth captured at time t The entire dataset comprises multiples pixels for each point in earth at different times, but not for all times Typical processing is a reduction along the time dimension - hard to write on the raw data format
Using a High-level Schema High-level view of the dataset – a simple collection of pixels Latitude, longitude, and time explicitly stored with each pixel Easy to specify processing Don’t care about locality / unnecessary scans At least one order of magnitude overhead in storage Suitable as a logical format only
XQuery Overview XQuery -A language for querying and processing XML document - Functional language - Single Assignment - Strongly typed XQuery Expression - for let where return (FLWR) - unordered - path expression Unordered( For $d in document(“depts.xml”)//deptno let $e:=document(“emps.xml”)//emp [Deptno= $d] where count($e)>=10 return {$d, {count($e) } {avg($e/salary)} } )
Satellite- XQuery Code Unordered ( for $i in ( $minx to $maxx) for $j in ($miny to $maxy) let p:=document(“sate.xml”) /data/pixel where lat = i and long = j return {$i} {$j} {accumulate($p)} ) Define function accumulate ($p) as double { let $inp := item-at($p,1) let $NVDI := (( $inp/band1 - $inp/band0)div($inp/band1+$inp/band0 )+1)*512 return if (empty( $p) ) then 0 else { max($NVDI, accumulate(subsequence ($p, 2 ))) }
Challenges Need to translate to low-level schema Focus on correctness and avoiding unnecessary reads Enhancing locality Data-centric execution on XQuery constructs Use information on low-level data layout Issues specific to XQuery Reductions expressed as recursive functions Generating code in an imperative language For either direct compilation or use a part of a runtime system Requires type conversion
Mapping to Low-level Schema A number of getData functions to access elements(s) of required types getData functions written in XQuery allow analysis and transformations Want to insert getData functions automatically preserve correctness and avoid unnecessary scans getData(lat x, long y) getData(lat x) getData(long y) getData(lat x, long y, time t) ….
Summary – XML Based Front-ends A case for the use of XML technologies in scientific data analysis XQuery – a data parallel language ? Identified and addressed compilation challenges A compilation system has been built Very large performance gains from data-centric transformations Preliminary evidence that high-level abstractions and query language do not degrade performance substantially
Outline Compiler supported Coarse-grained pipelined parallelism Why ? How ? XML Based front-ends to scientific datasets Compiler support for application self- adaptation A SQL front-end to a grid data management system
Applications in a Grid Environment characteristics summarized long-running applications adaptation to changing environments is desirable constraints-based response time output can be varied in a given range resolution accuracy precision How to achieve adaptation?
Proposed Language Extensions public interface Adapt_Spec { string constraints; // “RESP_TIME <= 50ms” List opti_vars; // “m”, “clipwin.x” List thresholds; // “m>=N”, “sampling_factor>=1” List opti_dir; }
Implementation Issues & Strategies Language Aspect Compiler Implementation Performance Modeling & Resource Monitoring Experimental Design
Outline Compiler supported Coarse-grained pipelined parallelism Why ? How ? XML Based front-ends to scientific datasets Compiler support for application self- adaptation A SQL front-end to a grid data management system
Overview of the Project Cyber-infrastructure/grid environment comprises distributed data sources Users will like seem-less access to the data SQL is popular for accessing data from a single database SQL for grid-based accesses Data is distributed Data is not managed by the relational database system Need to export data layout information to the query planner
Overview (Contd.) Use Grid-db-lite as the backend A grid data management middleware Define and use a data description language Parse SQL queries and the data description language and generate a Grid-db-lite application
Design Dataset description file Data set schema Dataset list file Cluster configure Dataset storage location Meta-data Logical data space ( number of dimension ) Attributes for index declaration Partition Physical data storage annotation
[IPARS] RID = INT2 TIME = INT4 X = FLOAT Y = FLOAT Z = FLOAT POIL = FLOAT PWAT = FLOAT …… [bh] DatasetDescription = IPARS io = file Dim = 17x65x65 Npart = 8 … Osumed1 = osumed01.epn.osc.edu, osumed02.epn.osc.edu, … 0 = bh-10-1 osumed1 /scratch1/bh = bh-10-2 osumed1 /scratch1/bh-10-2 …… Description file Data list file { Group “ROOT” { DATASET “bh” { DATATYPE { IPARS } DATASPACE {RANK 3 } DATAINDEX { RID, TIME } PARTS { 9503, 9503, 9537, 9554, 9503, 9707, 9520, 9520 } DATA { DATASET SPACIAL, DATASET POIL, DATASET PWAT, …… } Group “SUBGROUP” { DATASET “SPACIAL” { DATATYPE { } DATASPACE { SKIP 4 LINES LOOP PARTS { X SPACE Y SPACE Z SKIP 1 LINE } DATA {PART in (0,1,2,3,4,5,6,7).0.PART.5.init } DATASET “POIL” { DATATYPE { } DATASPACE { LOOP TIME { SKIP 1 double LOOP PARTS { POIL } } DATA { PART in (0,1,2,3,4,5,6,7).0.PART.5.0 } …… } Meta- data
[TITAN] X = INT4 Y = INT4 Z = INT4 S1 = INT4 S2 = INT4 S3 = INT4 S4 = INT4 S5 = INT4 [TitanData] DatasetDescription = TITAN io = file Dim = NULL Npart = 1 Osumed1 = osumed01.epn.osc.edu 0 = NULL osumed1 /scratch1/weng/Titan/ Description file Data list file { Group “ROOT” { DATASET “TitanData” { DATATYPE { TITAN } DATASPACE {RANK 3 } DATAINDEX { FID, OFFSET, BSIZE } DATA { DATASET TITAN, INDEXSET TITANINDEX} } Group “SUBGROUP” { DATASET “TITAN” { DATATYPE { struct TITAN_Record_t {unsigned int x, y, z; unsigned int s1,s2,s3,s4,s5; }; } DATASPACE { LOOP {struct TITAN_Record_t} } DATA { 0 } } INDEXSET “TITANINDEX” { DATATYPE { HOST hostid; struct Block3D { MBR rect; JMP jmp; FID fid; OFFSET offset; BSIZE bsize; }; } DATASPACE { LOOP { HOST SPACE struct Block3D } } DATA { IndexFile } } } } Meta-data
Compilation Issues Interface between Index() and Extractor() Range query A chunk can be totally in the query range, partially in the query range, or totally outside of the query range How to choose a suitable size for indexed chunks Interface between Extractor() and GridDB-lite Explore alternative methods to get tuples/records Smarter extractor can signal GridDB for some filtering operations Query transform Optimization? Hosts allocation for stages ( DP, DM, Client) Some other potential issues? The granularity of “tuple” Data partitioning methods …
Other Research Areas Runtime support systems Ease parallelization of data mining algorithms in a cluster environment (FREERIDE) Grid-based processing of distributed data streams Algorithms for Data Mining / OLAP Parallel and scalable algorithms Algorithms for processing distributed data streams
Group Members Seven Ph.D students Liang Chen Wei Du Anjan Goswami Ruoming Jin Xiaogang Li Li Weng Xuan Zhang Two Masters students Leo Glimcher Swarup Sahoo Part-time student Kolagatla Reddy
Getting Involved Talk to me Sign in for my 888