High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Slides:

Advertisements

Similar presentations

Computer Science 320 Clumping in Parallel Java. Sequential vs Parallel Program Initial setup Execute the computation Clean up Initial setup Create a parallel.

Advertisements

Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000 Frederick Wong, Rich Martin, Remzi Arpaci-Dusseau, David Wu, and.

Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.

IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.

Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy

A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms Kang Chen and Jeremy Johnson Department of Mathematics and.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.

Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

An Implementation and Performance Evaluation of Language with Fine-Grain Thread Creation on Shared Memory Parallel Computer Yoshihiro Oyama, Kenjiro Taura,

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web Gagan Agrawal u.

Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Compiling Several Classes of Communication Patterns on a Multithreaded Architecture Gagan Agrawal Department of Computer and Information Sciences Ohio.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.

Compiler Supported High-level Abstractions for Sparse Disk-resident Datasets Renato Ferreira Gagan Agrawal Joel Saltz Ohio State University.

Sep 11, 2009 Automatic Transformation of Applications onto GPUs and GPU Clusters PhD Candidacy presentation: Wenjing Ma Advisor: Dr Gagan Agrawal The.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

Sep/05/2001PaCT Fusion of Concurrent Invocations of Exclusive Methods Yoshihiro Oyama (Japan Science and Technology Corporation, working in University.

Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.

RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Research Overview Gagan Agrawal Associate Professor.

Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal The Ohio State University.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Year 2 Updates.

Department of Computer Science University of California, Santa Barbara

Communication and Memory Efficient Parallel Decision Tree Construction

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Data-Intensive Computing: From Clouds to GPU Clusters

Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How

Department of Computer Science University of California, Santa Barbara

Gary M. Zoppetti Gagan Agrawal Rishi Kumar

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

LCPC02 Wei Du Renato Ferreira Gagan Agrawal

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University

Motivation Languages, compilers, and runtime systems for high- end computing Typically focus on scientific applications Can commercial applications benefit ? A majority of top 500 parallel configurations are used as database servers Is there a role for parallel systems research ? Parallel relational databases – probably not Data mining, OLAP, decision support – quite likely

Data Mining Extracting useful models or patterns from large datasets Includes a variety of tasks - mining associations, sequences, clustering data, building decision trees, predictive models - several algorithms proposed for each Both compute and data intensive Algorithms are well suited for parallel execution High-level interfaces can be useful for application development

Project Overview

Project Components A middleware system called FREERIDE (Framework for Rapid Implementation of Datamining Engines) (SDM 01, SDM 02) Performance modeling and prediction (for parallelization strategy selection) SIGMETRICS 2002 Data parallel compilation (under submission) Translation from mining operators (not yet ) Focus on design and evaluation of the interface for shared memory parallelization in this paper

Outline Key observation from mining algorithms Parallelization challenge, techniques and trade-offs Programming Interface Experimental Results K- means Apriori Summary and future work

Common Processing Structure Structure of Common Data Mining Algorithms {* Outer Sequential Loop *} While () { { * Reduction Loop* } Foreach (element e) { (i,val) = process(e); Reduc(i) = Reduc(i) op val; } Applies to major association mining, clustering and decision tree construction algorithms How to parallelize it on a shared memory machine?

Challenges in Parallelization Statically partitioning the reduction object to avoid race conditions is generally impossible. Runtime preprocessing or scheduling also cannot be applied Can’t tell what you need to update w/o processing the element The size of reduction object means significant memory overheads for replication Locking and synchronization costs could be significant because of the fine-grained updates to the reduction object.

Parallelization Techniques Full Replication: create a copy of the reduction object for each thread Full Locking: associate a lock with each element Optimized Full Locking: put the element and corresponding lock on the same cache block Fixed Locking: use a fixed number of locks Cache Sensitive Locking: one lock for all elements in a cache block

Memory Layout for Various Locking Schemes Full Locking Fixed Locking Optimized Full LockingCache-Sensitive Locking LockReduction Element

Programming Interface: k-means example Initialization Function void Kmeans::initialize() { for (int i=0;i<k;i++) { clusterID[I]=reducobject->alloc(ndim+2); } {* Initialize Centers *} }

k-means example (contd.) Local Reduction Function void Kmeans::reduction(void *point) { for (int I=0;I<k;I++) { dis=distance(point,I); if (dis<min) { min=dis; min_index=I; } objectID=clusterID[min_index]; for (int j=0;j<ndim;j++) reductionobject->Add(objectID,j,point[j]); reduction object->Add(objectID,ndim,1); reductionobject->Add(objectID,ndim+1,dis); }

Implementation from the Common Specification Template inline void Reducible ::Reduc(int objectID, int Offset, void (*func)(void *,void*), int *param) { T* group_address=reducgroup[ObjectID]; switch (TECHNIQUE) { case FULL_REPLICATION: func(group_address[Offset],param); break; case FULL_LOCKING: offset=abs_offset(ObjectID,Offset); S_LOCK(&locks[offset]); func(group_address[Offset],param); S_UNLOCK(&locks[offset]); break; case OPTIMIZED_FULL_LOCKS: S_LOCK(& group_address[Offset*2]); func(group_address[Offset*2+1],param); S_UNLOCK(& group_address[Offset*2]); break; }

Experimental Platform Small SMP machine Sun Ultra Enterprise X 250 MHz Ultra-II processors 1 GB of 4-way interleaved main memory Large SMP machine Sun Fire X 900 MHz Sun UltraSparc III A 96KB L1 cache and a 64 MB L2 cache per processor 24 GB main memory

Results Scalability and Middleware Overhead for Apriori: 4 Processor SMP Machine

Results Scalability and Middleware Overhead for Apriori: Large SMP Machine

Results Scalability and Middleware Overhead for K-means: 4 Process SMP Machine 200MB dataset, k=1000

Results Scalability and Middleware Overhead for K-means: Large SMP Machine

Compiler Support Use a data parallel dialect of Java Well suited for expressing common mining algorithms Main computational loops are data parallel Use the notion of reduction interface to implement reduction objects Our compiler generates middleware code

Experimental Evaluation Currently limited to distributed memory parallelization