Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University.

Slides:



Advertisements
Similar presentations
Naïve-Bayes Classifiers Business Intelligence for Managers.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Topology-Caching for Dynamic Particle Volume Raycasting Jens Orthmann, Maik Keller and Andreas Kolb, University of Siegen.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 7.3 Secure and Resilient Location Discovery in Wireless.
Acoustic design by simulated annealing algorithm
Young Deok Chun, Nam Chul Kim, Member, IEEE, and Ick Hoon Jang, Member, IEEE IEEE TRANSACTIONS ON MULTIMEDIA,OCTOBER 2008.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
3D Computer Rendering Kevin Ginty Centre for Internet Technologies
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
FLANN Fast Library for Approximate Nearest Neighbors
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
G RID R ESOURCE BROKER FOR SCHEDULING COMPONENT - BASED APPLICATIONS ON DISTRIBUTED RESOURCES Reporter : Yi-Wei Wu.
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Learning a Fast Emulator of a Binary Decision Process Center for Machine Perception Czech Technical University, Prague ACCV 2007, Tokyo, Japan Jan Šochman.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Images Similarity by Relative Dynamic Programming M. Sc. thesis by Ady Ecker Supervisor: prof. Shimon Ullman.
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism Wei Du Renato Ferreira Gagan Agrawal Ohio-State University.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Authors: Mianyu Wang, Nagarajan Kandasamy, Allon Guez, and Moshe Kam Proceedings of the 3 rd International Conference on Autonomic Computing, ICAC 2006,
CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
Partitioning Screen Space 2 Rui Wang. Architectural Implications of Hardware- Accelerated Bucket Rendering on the PC (97’) Dynamic Load Balancing for.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University.
Research Overview Gagan Agrawal Associate Professor.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Introduction to emulators Tony O’Hagan University of Sheffield.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN
Conception of parallel algorithms
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
QianZhu, Liang Chen and Gagan Agrawal
Dynamo: A Runtime Codesign Environment
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
An Adaptive Middleware for Supporting Time-Critical Event Response
Smita Vijayakumar Qian Zhu Gagan Agrawal
Covering Uncertain Points in a Tree
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Resource Allocation in a Middleware for Streaming Data
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
Resource Allocation for Distributed Streaming Applications
The connected word recognition problem Problem definition: Given a fluently spoken sequence of words, how can we determine the optimum match in terms.
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Filter Decomposition for Supporting Coarse-grained Pipelined Parallelism Wei Du, Gagan Agrawal Ohio State University

Distributed Data-Intensive Applications Fast growing datasets Remote data access Distributed data storage More connected world Internet data

Requirements: Huge Storage/Powerful Computer/Fast Connection Internet data Implementation: Local processing Internet data

Internet data Implementation: Remote processing Requirements: Complex Analysis at Data Centers

Our hypothesis Coarse-grained pipelined execution model is a good match Internet A Practical Solution data

Coarse-Grained Pipelined Execution Definition Computations associated with an application are carried out in several stages, which are executed on a pipeline of computing units Example — K-nearest Neighbor (KNN) Given a 3-D range R=, and a point p = (a, b, c). We want to find the nearest K neighbors of p within R. Range_queryFind the K-nearest neighbors

Challenges Computation associated with an application needs to be decomposed into stages Decomposition decisions are dependent on the execution environment Generating code for each stage (SC03) Other performance issues for the pipelined execution (ICPP04) Adapting to the dynamic execution environment (SC04)

RoadMap Filter Decomposition Problem MIN_ONETRIP Algorithm MIN_BOTTLENECK Algorithm MIN_TOTAL Algorithm Experimental Results Related Work Conclusion

Filter Decomposition C1C1 C2C2 C m-1 CmCm L1L1 L m-1 computation pipeline f1f1 f2f2 f n-1 fnfn atomic filters f 3 - f 6 fnfn L1L1 C1C1 CmCm C m-1 f1f1 f 1, f 2 C2C2 f n-1 L m-1 f 2, f 3 fnfn L1L1 C1C1 CmCm C m-1 f1f1 f1f1 C2C2 f n-2,f n-1 L m-1

Filter Decomposition C1C1 C2C2 C m-1 CmCm L1L1 L m-1 computation pipeline f1f1 f2f2 f n-1 fnfn atomic filters Goal: Find a placement p (f 1,f 2, …, f n ) = (F 1, F 2, …, F m ) where F i = f i1, f i1+1, …, f ik, (1 ≤ i 1,i k ≤ n) such that the predicted execution time is minimal (1≤ i ≤ m).

f3f3 f4f4 L1L1 C1C1 C3C3 f1f1 f 1, f 2 C2C2 L2L2 Cost Model Bottleneck stage: b th stage the slowest stage in the pipeline Execution time T = T(C 1 )+T(L 1 )+N*T(C 2 )+T(L 2 )+T(C 3 ) = ∑ i≠b T i + (N-1)*T b

Three Algorithms MIN_ONETRIP Algorithm dynamic programming algorithm to minimize ∑T i MIN_BOTTLENECK Algorithm dynamic programming algorithm to minimize T b MIN_TOTAL Algorithm greedy algorithm try to minimize T T = ∑ i≠b T i + (N-1)*T b

Filter Decomposition: MIN_ONETRIP C m-2 C m-1 CmCm L m-1 L m-2 f n-1 fnfn fnfn Goal: minimize time spent by one packet on the pipeline

C m-2 C m-1 CmCm L m-1 L m-2,…, T[i,j]: min cost of doing computations f 1,…, f i on computing units C 1,…, C j, where the results of f i are on C j. T[i,j] = min T[i-1,j] + Cost_comp(P(C j ),Task(f i )) T[i,j-1] + Cost_comm(B(L j-1 ),Vol(f i )) Goal: T[n,m] Cost: O(mn) Filter Decomposition: MIN_ONETRIP

Filter Decomposition: MIN_BOTTLENECK C m-2 C m-1 CmCm L m-1 L m-2 fnfn f1f1 … fnfn f n-1 f1f1 … f n-1 f n f n-2 f1f1 … …… f 2 …f n f1f1 Goal: minimize time spent at the bottleneck stage

,…, N[i,j]: min cost of bottleneck stage for computing f 1,…, f i on computing units C 1,…, C j, where the results of f i are on C j. Cost: O(mn 2 ) N[i,j] = min max{ N[i,j-1], Cost_comm (B(L j-1 ),Vol(f i )) } … max{ N[i-1,j-1], Cost_comm (B(L j-1 ),Vol(f i-1 )), Cost_Comp (P(C j ),Task(f i )) } max{ N[1,j-1], Cost_comm (B(L j-1 ),Vol(f 1 )), Cost_Comp (P(C j ), Task(f 2 ) + … + Task(f i )) } Filter Decomposition: MIN_BOTTLENECK

C1C1 C2C2 C3C3 C4C4 L1L1 L3L3 L2L2 f1f1 f2f2 f3f3 f4f4 f5f5 L1L1 C1C1 C3C3 C4C4 C2C2 f1f1 f 1 : T 1 Estimated Cost f 1, f 2 f 1, f 2 : T 2 f 1 - f 3 : T 3 f 1 - f 4 : T 4 Min{T 1 … T 4 } = T 2 To minimize the predicted execution time T Filter Decomposition: MIN_BOTTLENECK

RoadMap Filter Decomposition Problem MIN_ONETRIP Algorithm MIN_BOTTLENECK Algorithm MIN_TOTAL Algorithm Experimental Results Related Work Conclusion

Experimental Results 4 Configurations 3 Applications Virtual Microscope Iso-Surface Rendering

Used Applications Virtual Microscope (Vmscope) an emulation of a microscope input: a rectangular region, a resolution value output: portion of the original image with certain resolution

Experimental Results: Virtual Microscope 3 queries Q1 : 1 packet Q2 : 4 packets Q3 : 4500 packets 4 Algorithms MIN_ONETRIP MIN_BOTTLENECK MIN_TOTAL Exhaustive_Search

Execution Time (in ms) Application Execution Time (in ms) Application Experimental Results: Virtual Microscope

Two observations The performance variance between different algorithms is small The Exha_Search does not always give the best placement characteristics based on one packet information combining two filters as one, saving copying cost Experimental Results: Virtual Microscope

Iso-surface rendering (Iso) input: a 3-D grid, a scalar value, a view screen with angle specified output: a surface seen from certain angle, which captures points in the grid whose scalar value matches the given iso-surface value Used Applications

Experimental Results: Iso 2 Implementations ZBUF ACTP 2 Datasets small : 3 packets large : 47 packets 4 Algorithms MIN_ONETRIP MIN_BOTTLENECK MIN_TOTAL Exhaustive_Search

Execution Time (in ms) Application Execution Time (in ms) Application Small dataset Large dataset Experimental Results: Iso

The MIN_TOTAL algorithm gives the best placement for small dataset The MIN_ONETRIP algorithm finds the best placement for large dataset This application is very data-dependent ! Experimental Results: Iso

Execution Time (in ms) Number of Runs Execution Time (in ms) Number of Runs ZBUF ACTP Experimental Results: Iso

Conclusion & Future Work Our algorithms perform quite well Future Work To find more accurate characteristics of applications estimate of the performance change resulting from combining multiple atomic filters estimate of the impact of data dependence

Thank you !!!