LCPC02 Wei Du Renato Ferreira Gagan Agrawal

Slides:



Advertisements
Similar presentations
Chapter 1 An Overview of Computers and Programming Languages.
Advertisements

Chapter 1: An Overview of Computers and Programming Languages J ava P rogramming: From Problem Analysis to Program Design, From Problem Analysis to Program.
The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.
Architecture for a Portable Grid-enabled Engine School of Computer Science University of Westminster - London Bruce Long Vladimir Getov.
IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism Wei Du Renato Ferreira Gagan Agrawal Ohio-State University.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
Compiler Supported High-level Abstractions for Sparse Disk-resident Datasets Renato Ferreira Gagan Agrawal Joel Saltz Ohio State University.
Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal.
Computer Science 320 A First Program in Parallel Java.
Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Research Overview Gagan Agrawal Associate Professor.
Holding slide prior to starting show. Processing Scientific Applications in the JINI-Based OGSA-Compliant Grid Yan Huang.
Enabling Control over Adaptive Program Transformation for Dynamically Evolving Mobile Software Validation Mike Jochen, Anteneh Anteneh, Lori Pollock University.
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Parallelisation of Desktop Environments Nasser Giacaman Supervised by Dr Oliver Sinnen Department of Electrical and Computer Engineering, The University.
Introduction to Algorithm. What is Algorithm? an algorithm is any well-defined computational procedure that takes some value, or set of values, as input.
Efficient Evaluation of XQuery over Streaming Data
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Chapter 1 Introduction.
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
GWE Core Grid Wizard Enterprise (
SOFTWARE DESIGN AND ARCHITECTURE
Spark Presentation.
Pipeline Execution Environment
Chapter 1 Introduction.
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Supporting Fault-Tolerance in Streaming Grid Applications
Mobile Development Workshop
Software models - Software Architecture Design Patterns
Software testing and configuration : Embedded software testing
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Grid Based Data Integration with Automatic Wrapper Generation
(Computer fundamental Lab)
Lecture 2 The Art of Concurrency
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
The Ohio State University
Chap 1. Getting Started Objectives
Computer Programming-1 CSC 111
New (Applications of) Compiler Techniques for Data Grids
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

LCPC02 Wei Du Renato Ferreira Gagan Agrawal Towards Compiler Support for Data Intensive Applications on Distributed Heterogeneous Resources Wei Du Renato Ferreira Gagan Agrawal Ohio-State University

Ohio-State University Motivation Grid Environment geographically distributed heterogeneous resources Scientific and commercial data intensive applications generalized reduction operations are very common in the processing structure No compiler support for high-level languages for grid application development 7/26/02 Ohio-State University

Software Architecture Data Parallel Java assume all data are available in a flat memory assume all computation are done on a single processor extensions of Java: domain & rectdomain foreach loop reduction variables Data Parallel Java Compiler Support Filter-stream Program On DataCutter 7/26/02 Ohio-State University

Example code public class kNN { static buffer kbuffer; public static void main(String[] args) { double dis; Point<3> lowend =[0,0,0]; Point<3> hiend =[Integer.parseInt(args[0]), Integer.parseInt(args[1]), Integer.parseInt(args[2])]; Point<3> p; RectDomain<3> InputDomain=[lowend:hiend]; kPoint[3d] Input=new kPoint[InputDomain]; foreach (p in InputDomain) { if (Input[p].inRange(R)) dis=Input[p].distance(W); kbuffer.insert(Input[p],dis); }

DataCutter — Grid development tool ongoing project at University of Maryland / OSU ( Beynon, Kurc, Sussman, Saltz et al.) targets distributed, heterogeneous environments decomposes application-specific data processing operations into a set of interacting processes provides a set of interfaces filter stream layout & placement stream1 stream2 filter1 filter2 filter3

Compiler Overview Data Parallel Java Data Centric Transformation Loop Fission Global Reduction Analysis Filter Enumeration Granularity Selection Final Code Generation Filter-Stream programming 7/26/02 Ohio-State University

Experience with a Mining Algo. K-Nearest Neighbors Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point  = (a, b, c). We want to find the nearest K neighbors of  within R. Range_Query Input[p].inRange? discard Local reduction Dis = … … Kbuffer.insert(…, …) Range_query Select Combine R-S stream S-C stream Read data 7/26/02 Ohio-State University

Experimental Results Experimented on a LAN Data is available on 2 machines Results are on the third machine Dataset contains 3-D points, of size 1.2M and 12M K is 20 Consider 0, 1, 4, 16 other computation intensive jobs on machines hosting data Running time is in milliseconds Performance difference of the two versions is within 15% Comparing Manual Version and compiler-generated Version

Experience with Image Querying Virtual Microscope Input: a digitized image, a rectangular region R, a subsampling factor Output: an enlarged image for the specified image region with a particular sampling factor querybox = [lowend, highend] foreach (p in querybox) { if (VScope[p].is_sampled(lowend, subsampling_factor) q = (p-lowend)/subsampling_factor; Output[q].Assign(VScope[p]); } 7/26/02 Ohio-State University

Experimental Results Experimented on a LAN Data is available on machine other than where the results are to be displayed Image is of size 800k Sampling factor are 4 and 16 Consider 0, 1, 4, 16 other computation intensive jobs on machines hosting data Running time is in milliseconds Performance difference of the two versions is within 15% Comparing Manual Version and compiler-generated Version

Summary and Future Work Aims at developing compiler support for using heterogeneous and distributed resources for processing geographically distributed datasets Experimented on simple data intensive codes with simple filter generation heuristics – Initial results are quite encouraging Future Work: more applications (visualization) more compiler analysis (sophisticated heuristics, loop fission, data-centric code generation … …) 7/26/02 Ohio-State University

Thank you !!!

Compiler Overview Data Parallel Java Data Centric Transformation Loop Fission Global Reduction Analysis Filter Enumeration Granularity Selection Final Code Generation Filter-Stream programming 10/17/2019 Ohio-State University

Ohio-State University Data Parallel Java · assume all data are available in a flat memory · assume all computation are done on a single processor · extensions of Java: to give compiler information about independent collections of objects, parallel loops and reduction operations — domain & rectdomain — foreach loop — reduction variables: can only be updated inside a foreach loop by operations that are associative & commutative intermediate value of the reduction variables may not be used within the loop, except for self-updates 10/17/2019 Ohio-State University

Ohio-State University Why DataCutter ? Typical DDM algorithm DataCutter features local reduction on geographically dispersed data global reduction to combine the results decomposition of application into a set of filters filters are location independent filters interact with each other via streams 10/17/2019 Ohio-State University

K-nearest neighbor search algorithm on DataCutter — A case study Problem definition: Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point  = (a, b, c). We want to find the nearest K neighbors of  within R. Solution: Range_query Select Combine R-S stream S-C stream Read data 10/17/2019 Ohio-State University