Download presentation
Presentation is loading. Please wait.
1
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
Towards Compiler Support for Data Intensive Applications on Distributed Heterogeneous Resources Wei Du Renato Ferreira Gagan Agrawal Ohio-State University
2
Ohio-State University
Motivation Grid Environment geographically distributed heterogeneous resources Scientific and commercial data intensive applications generalized reduction operations are very common in the processing structure No compiler support for high-level languages for grid application development 7/26/02 Ohio-State University
3
Software Architecture
Data Parallel Java assume all data are available in a flat memory assume all computation are done on a single processor extensions of Java: domain & rectdomain foreach loop reduction variables Data Parallel Java Compiler Support Filter-stream Program On DataCutter 7/26/02 Ohio-State University
4
Example code public class kNN { static buffer kbuffer;
public static void main(String[] args) { double dis; Point<3> lowend =[0,0,0]; Point<3> hiend =[Integer.parseInt(args[0]), Integer.parseInt(args[1]), Integer.parseInt(args[2])]; Point<3> p; RectDomain<3> InputDomain=[lowend:hiend]; kPoint[3d] Input=new kPoint[InputDomain]; foreach (p in InputDomain) { if (Input[p].inRange(R)) dis=Input[p].distance(W); kbuffer.insert(Input[p],dis); }
5
DataCutter — Grid development tool
ongoing project at University of Maryland / OSU ( Beynon, Kurc, Sussman, Saltz et al.) targets distributed, heterogeneous environments decomposes application-specific data processing operations into a set of interacting processes provides a set of interfaces filter stream layout & placement stream1 stream2 filter1 filter2 filter3
6
Compiler Overview Data Parallel Java Data Centric Transformation
Loop Fission Global Reduction Analysis Filter Enumeration Granularity Selection Final Code Generation Filter-Stream programming 7/26/02 Ohio-State University
7
Experience with a Mining Algo.
K-Nearest Neighbors Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point = (a, b, c). We want to find the nearest K neighbors of within R. Range_Query Input[p].inRange? discard Local reduction Dis = … … Kbuffer.insert(…, …) Range_query Select Combine R-S stream S-C stream Read data 7/26/02 Ohio-State University
8
Experimental Results Experimented on a LAN
Data is available on 2 machines Results are on the third machine Dataset contains 3-D points, of size 1.2M and 12M K is 20 Consider 0, 1, 4, 16 other computation intensive jobs on machines hosting data Running time is in milliseconds Performance difference of the two versions is within 15% Comparing Manual Version and compiler-generated Version
9
Experience with Image Querying
Virtual Microscope Input: a digitized image, a rectangular region R, a subsampling factor Output: an enlarged image for the specified image region with a particular sampling factor querybox = [lowend, highend] foreach (p in querybox) { if (VScope[p].is_sampled(lowend, subsampling_factor) q = (p-lowend)/subsampling_factor; Output[q].Assign(VScope[p]); } 7/26/02 Ohio-State University
10
Experimental Results Experimented on a LAN
Data is available on machine other than where the results are to be displayed Image is of size 800k Sampling factor are 4 and 16 Consider 0, 1, 4, 16 other computation intensive jobs on machines hosting data Running time is in milliseconds Performance difference of the two versions is within 15% Comparing Manual Version and compiler-generated Version
11
Summary and Future Work
Aims at developing compiler support for using heterogeneous and distributed resources for processing geographically distributed datasets Experimented on simple data intensive codes with simple filter generation heuristics – Initial results are quite encouraging Future Work: more applications (visualization) more compiler analysis (sophisticated heuristics, loop fission, data-centric code generation … …) 7/26/02 Ohio-State University
12
Thank you !!!
13
Compiler Overview Data Parallel Java Data Centric Transformation
Loop Fission Global Reduction Analysis Filter Enumeration Granularity Selection Final Code Generation Filter-Stream programming 10/17/2019 Ohio-State University
14
Ohio-State University
Data Parallel Java · assume all data are available in a flat memory · assume all computation are done on a single processor · extensions of Java: to give compiler information about independent collections of objects, parallel loops and reduction operations — domain & rectdomain — foreach loop — reduction variables: can only be updated inside a foreach loop by operations that are associative & commutative intermediate value of the reduction variables may not be used within the loop, except for self-updates 10/17/2019 Ohio-State University
15
Ohio-State University
Why DataCutter ? Typical DDM algorithm DataCutter features local reduction on geographically dispersed data global reduction to combine the results decomposition of application into a set of filters filters are location independent filters interact with each other via streams 10/17/2019 Ohio-State University
16
K-nearest neighbor search algorithm on DataCutter — A case study
Problem definition: Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point = (a, b, c). We want to find the nearest K neighbors of within R. Solution: Range_query Select Combine R-S stream S-C stream Read data 10/17/2019 Ohio-State University
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.