Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Slides:



Advertisements
Similar presentations
A distributed method for mining association rules
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
FLANN Fast Library for Approximate Nearest Neighbors
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web Gagan Agrawal u.
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
Computer Science and Engineering FREERIDE-G: A Grid-Based Middleware for Scalable Processing of Remote Data Leonid Glimcher Gagan Agrawal.
RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.
Rapid Tomographic Image Reconstruction via Large-Scale Parallelization Ohio State University Computer Science and Engineering Dep. Gagan Agrawal Argonne.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.
Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Research Overview Gagan Agrawal Associate Professor.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Sameh Shohdy, Yu Su, and Gagan Agrawal
Year 2 Updates.
Applying Twister to Scientific Applications
Linchuan Chen, Peng Jiang and Gagan Agrawal
On Spatial Joins in MapReduce
Communication and Memory Efficient Parallel Decision Tree Construction
Data-Intensive Computing: From Clouds to GPU Clusters
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
A Grid-Based Middleware for Scalable Processing of Remote Data
Resource Allocation for Distributed Streaming Applications
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing a Scientific Feature Detection and Categorization Application Using a Cluster Middleware. L. Glimcher, G. Agrawal, S. Mehta, R. Jin, R. Machiraju The Ohio State University

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 2 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 3 ipdps’05 Motivation for FREERIDE Problem: –Simulation data from engineering and scientific applications is growing larger, –Analysis models are more complex, –Drawing knowledge becomes increasingly more complicated. Solution: –Parallel datamining, but … Catch: application development effort.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 4 ipdps’05 FREERIDE KEY observation: most algorithms follow canonical loop. Middleware API: Subset of data to be processed, Reduction object, Local and global reduction operations, Iterator. Supports: Disk resident datasets Shared & distributed Memory While( ) { forall( data instances d) { I = process(d) R(I) = R(I) op d } ……. }

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 5 ipdps’05 Previously on FREERIDE FREERIDE has been used for: –Apriori and FP-tree frequent item set mining, –KNN classification and decision tree construction, –K-means and EM clustering, –Vortex Detection (IPDPS 2004). Will it work for a scientific mining task with a more complex processing structure?

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 6 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 7 ipdps’05 Overview of Sequential Algorithm To understand the properties of the materials –How defects affect the materials? Data generated by Molecular Dynamics Simulation –Simulator by Physics Department (OSU) Main Tasks –Phase 1 – Defect Detection –Phase 2 – Defect Categorization

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 8 ipdps’05 Example – Different shades represent different detected defects

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 9 ipdps’05 Mapping detection/categorization to FREERIDE

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 10 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of sequential defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 11 ipdps’05 Key Parallelization Issues Challenges in detection phase stem from partitioning data into chunks: –detection on chunk boundaries, –joining multi-chunk defects. Categorization phase: 1.Load balancing is necessary for scalability. 2.Updating catalog with new classes needs to be efficient.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 12 ipdps’05 Detection Challenges

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 13 ipdps’05 Intuitive (un-balanced) Categorization P N M Increasing no. of nodes will increase “sequential” fraction.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 14 ipdps’05 Load Balanced Categorization Approach has been tested with variable number of multi-node defects.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 15 ipdps’05 Intuitive (sequential) Catalog Updates “Catalog completeness” has direct effect on scalability.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 16 ipdps’05 Parallel Catalog Updates Tested with different levels of “catalog completeness”.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 17 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of sequential defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 18 ipdps’05 Experimental Results: Demonstrating Scalability Experimental results for up to 8 processing nodes. Experimental Platform: –Cluster (1-8) of 700 MHz Pentium machines –Connected through Myrinet LANai 7.0 –1 GB memory each node –Datasets ranging in size from 133 MB to 1.8 GB Breakdown of Total Execution time (1.8 GB)

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 19 ipdps’05 More Scalability Experiments 480 MB Dataset, 1-8 nodes Catalog completeness varies, but speedups remain near linear. More scalability experiments in paper.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 20 ipdps’05 Experimental Results: Evaluating Load Balancing 480 MB, 2/3 in db480 MB, 0/3 in db Optimized scales better!

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 21 ipdps’05 Experimental Results: Parallel Matching Approach Default implementation performs sequential categorization of the non- matching defects. Optimized implementation: 1.parallel local catalog update, 2.merging of local catalogs on Master node, 3. finalizing local catalogs in parallel.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 22 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of sequential defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 23 ipdps’05 Conclusions FREERIDE can be used to parallelize scientific mining algorithms with a more complex processing structure. Scalability can be achieved with less programming effort than if a parallel application was “hand-coded”. Parallel applications created using FREERIDE allow working efficiently with disk-resident datasets. Our approaches to load balancing and to parallel categorization of non-matching defects perform better than naïve approaches to solving the posed problem.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 24 ipdps’05 Questions?