Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web Gagan Agrawal u.

Slides:

Advertisements

Similar presentations

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY

Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.

DISTRIBUTED COMPUTING

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.

Sep 11, 2009 Automatic Transformation of Applications onto GPUs and GPU Clusters PhD Candidacy presentation: Wenjing Ma Advisor: Dr Gagan Agrawal The.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.

Computer Science and Engineering FREERIDE-G: A Grid-Based Middleware for Scalable Processing of Remote Data Leonid Glimcher Gagan Agrawal.

RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Clouds , Grids and Clusters

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Communication and Memory Efficient Parallel Decision Tree Construction

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Support for ”interactive batch”

Data-Intensive Computing: From Clouds to GPU Clusters

An Adaptive Middleware for Supporting Time-Critical Event Response

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

A Grid-Based Middleware for Scalable Processing of Remote Data

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web Gagan Agrawal u

Data-Intensive Computing Simply put: scalable analysis of large datasets How is it different from: related to – Databases: Emphasis on processing of static datasets – Data Mining Community focused more on algorithms, and not scalable implementations – High Performance / Parallel Computing More focus on compute-intensive tasks, not I/O or large datasets – Datacenters Use of large resources for hosting data, less on their use for processing

Why Now ? Amount of data is increasing rapidly Cheap Storage Better connectivity, easy to move large datasets on web/grids Science shifting from compute-X to X- informatics Business intelligence and analysis Google’s Map-Reduce has created excitement

Architectural Context Processor architecture has gone through a major change – No more scaling with clock speeds – Parallelism – multi-core / many-core is the trend Accelerators like GPGPUs have become effective More challenges for scaling any class of applications

Grid/Cloud/Utility Computing Cloud computing is a major new trend in industry – Data and computation in a Cloud of resources – Pay for use model (like a utility) Has roots in many developments over the last decade – Service-oriented computing, Software as a Service (SaaS) – Grid computing – use of wide-area resources

My Research Group Data-intensive computing on emerging architectures Data-intensive computing in Cloud Model Data-integration and query processing – deep web data Querying low-level datasets through automatic workflow composition Adaptive computation – time as a constraint

Personnel Current students – 6 PhD students – 2 MS thesis students – Talking to several first year students Past students – 7 PhDs completed between 2005 and 2008

Outline FREERIDE: Data-intensive Computing on Cluster of Multi-cores A system for exploiting GPGPUs for data- intensive computing FREERIDE-G: Data-intensive computing on Cloud Environments Quick overview of three other projects

FREERIDE - Motivation Availability of very large datasets and it’s analysis (Data-intensive applications) Adaptation of Multi-core and inevitability of parallel programming Need for abstraction of difficulties of parallel programming.

FREERIDE A middle-ware for parallelizing Data-intensive applications Motivated by difficulties in implementing and performance tuning of Datamining applications Based on observation of similar generalized reduction among datamining, OLAP and other scientific applications

Generalized Reduction structure

SMP Techniques Full-replication(f-r) (obvious technique) Locking based techniques – Full-locking (f-l) – Optimized Full-locking(o-f-l) – Fixed Locking(fi-l) – Cache-sensitive locking( Hybrid of o-f-l & fi-l)

Memory Layout of SMP techs

Experimental setup Intel Xeon E5345 CPU 2 Quad-core machine Each core 2.33GHz 6GB Main memory Nodes in cluster connected by Infiniband

Experimental Results – K-means (CMP)

K-means (cluster)

Apriori (CMP)

Apriori (cluster)

E-M (CMP)

E-M (cluster)

Summary of Results Both Full-replication and Cache-sensitive locking can outperform each other based on the nature of application Cache-sensitive locking seems to have high overhead when there is little computation between updates in ReductionObject MPI processes competes well with best of other two when run on smaller cores, but experiences communication overheads when run on larger number of cores

Background: GPU Computing Multi-core architectures are becoming more popular in high performance computing GPU is inexpensive and fast CUDA is a high level language that supports programming on GPU

Architecture of GeForce 8800 GPU (1 multiprocessor)‏

Challenges of Data-intensive Computing on GPU SIMD shared memory programming 3 steps involved in the main loop – Data read – Computing update – Writing update

Complication of CUDA Programming User has to have thorough knowledge of the architecture of GPU and the programming model of CUDA Must specify the grid configuration Has to deal with the memory allocation and copy Need to know what data to be copied onto shared memory and how much shared memory to use ……

Architecture of the Middleware User input Code analyzer – Analysis of variables (variable type and size) – Analysis of reduction functions (sequential code from the user) Code Generator ( generating CUDA code and C++ code invoking the kernel function)

Architecture of the middleware Variable information Reduction functions Optional functions Code Analyzer( In LLVM) Variable Analyzer Code Generator Variable Access Pattern and Combination Operations Host Program Grid configuration and kernel invocation Kernel functions Executable

User Input A sequential reduction function Optional functions (initialization function, combination function…)‏ Values of each variable (typically specified as length of arrays)‏ Variables to be used in the reduction function

Analysis of Sequential Code Get the information of access features of each variable Figure out the data to be replicated Get the operator for global combination Calculate the size of shared memory to use and which data to be copied to shared memory

Experiment Results Speedup of k-means

Speedup of EM

Emergence of Cloud and Utility Computing Group generating data – use remote resources for storing data – Already popular with SDSC/SRB Scientist interested in deriving results from data – use distinct but remote resources for processing Remote Data Analysis Paradigm Data, Computation, and User at Different Locations Unaware of location of other

Remote Data Analysis Advantages – Flexible use of resources – Do not overload data repository – No unnecessary data movement – Avoid caching process once data Challenge: Tedious details: – Data retrieval and caching – Use of parallel configurations – Use of heterogeneous resources – Performance Issues Can a Grid Middleware Ease Application Development for Remote Data Analysis and Yet Provide High Performance ?

Computer Science and Engineering Our Work FREERIDE-G (Framework for Rapid Implementation of Datamining Engines in Grid) Enable Development of Flexible and Scalable Remote Data Processing Applications Repository cluster Compute cluster Middleware user

Challenges Support use of parallel configurations – For hosting data and processing data Transparent data movement Integration with Grid/Web Standards Resource selection – Computing resources – Data replica Scheduling and Load Balancing Data Wrapping Issues

Computer Science and Engineering FREERIDE (G) Processing Structure KEY observation: most data mining algorithms follow canonical loop Middleware API: Subset of data to be processed Reduction object Local and global reduction operations Iterator Derived from precursor system FREERIDE While( ) { forall( data instances d) { (I, d’) = process(d) R(I) = R(I) op d’ } ……. }

FREERIDE-G Evolution FREERIDE data stored locally FREERIDE-G ADR responsible for remote data retrieval SRB responsible for remote data retrieval FREERIDE-G grid service Grid service featuring Load balancing Data integration

Computer Science and Engineering Evolution FREERIDE FREERIDE-G-ADR FREERIDE-G-SRBFREERIDE-G-GT Application Data ADR SRB Globus

FREERIDE-G System Architecture

Compute Node More compute nodes than data hosts Each node: 1.Registers IO (from index) 2.Connects to data host While (chunks to process) 1.Dispatch IO request(s) 2.Poll pending IO 3.Process retrieved chunks

FREERIDE-G in Action SRB Agent SRB Master MCAT Data Host I/O Registration Connection establishment While (more chunks to process) I/O request dispatched Pending I/O polled Retrieved data chunks analyzed Compute Node

Implementation Challenges Interaction with Code Repository – Simplified Wrapper and Interface Generator – XML descriptors of API functions – Each API function wrapped in own class Integration with MPICH-G2 – Supports MPI – Deployed through Globus components (GRAM) – Hides potential heterogeneity in service startup and management

Experimental setup Organizational Grid: Data hosted on Opteron 250 cluster Processed on Opteron 254 cluster Connected using 2 10 GB optical fibers Goals: Demonstrate parallel scalability of applications Evaluate overhead of using MPICH-G2 and Globus Toolkit deployment mechanisms

Computer Science and Engineering Deployment Overhead Evaluation Clearly a small overhead associated with using Globus and MPICH-G2 for middleware deployment. Kmeans Clustering with 6.4 GB dataset: 18-20%. Vortex Detection with 14.8 GB dataset: 17-20%.

Deep Web Data Integration The emerge of deep web – Deep web is huge – Different from surface web – Challenges for integration Not accessible through search engines Inter-dependences among deep web sources

Motivating Example ERCC6 dbSNP Entrez Gene Sequence Database Alignment Database Nonsynonymous SNP AA Positions for Nonsynonymous SNP Encoded Protein Encoded Orthologous Protein Protein Sequence occurring Given a gene ERCC6, we want to know the amino acid occurring in the corresponding position in orthologous gene of non-human mammals

Observations Inter-dependences between sources Time consuming if done manually Intelligent order of querying Implicit sub-goals in user query

Contributions Formulate the query planning problem for deep web databases with dependences Propose a dynamic query planner Develop cost models and an approximate planning algorithm Integrate the algorithm with a deep web mining tool

49 HASTE Middleware Design Goals To Enable the Time-critical Event Handling to Achieve the Maximum Benefit, While Satisfying the Time Constraint To be Compatible with Grid and Web Services To Enable Easy Deployment and Management with Minimum Human Intervention To be Used in a Heterogeneous Distributed Environment ICAC 2008

50 HASTE Middleware Design ICAC 2008

51 Workflow Composition System

Summary Several projects cross cutting Parallel Computing, Distributed Computing and Database/ Data mining Number of opportunities for MS thesis, MS project, and PhD students Relevant Courses – CSE 621/721 – CSE 762 – CSE 671 / 674