Download presentation
Presentation is loading. Please wait.
Published bySilvester Arnold Modified over 9 years ago
1
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing a Scientific Feature Detection and Categorization Application Using a Cluster Middleware. L. Glimcher, G. Agrawal, S. Mehta, R. Jin, R. Machiraju The Ohio State University
2
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 2 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.
3
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 3 ipdps’05 Motivation for FREERIDE Problem: –Simulation data from engineering and scientific applications is growing larger, –Analysis models are more complex, –Drawing knowledge becomes increasingly more complicated. Solution: –Parallel datamining, but … Catch: application development effort.
4
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 4 ipdps’05 FREERIDE KEY observation: most algorithms follow canonical loop. Middleware API: Subset of data to be processed, Reduction object, Local and global reduction operations, Iterator. Supports: Disk resident datasets Shared & distributed Memory While( ) { forall( data instances d) { I = process(d) R(I) = R(I) op d } ……. }
5
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 5 ipdps’05 Previously on FREERIDE FREERIDE has been used for: –Apriori and FP-tree frequent item set mining, –KNN classification and decision tree construction, –K-means and EM clustering, –Vortex Detection (IPDPS 2004). Will it work for a scientific mining task with a more complex processing structure?
6
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 6 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.
7
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 7 ipdps’05 Overview of Sequential Algorithm To understand the properties of the materials –How defects affect the materials? Data generated by Molecular Dynamics Simulation –Simulator by Physics Department (OSU) Main Tasks –Phase 1 – Defect Detection –Phase 2 – Defect Categorization
8
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 8 ipdps’05 Example – Different shades represent different detected defects
9
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 9 ipdps’05 Mapping detection/categorization to FREERIDE
10
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 10 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of sequential defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.
11
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 11 ipdps’05 Key Parallelization Issues Challenges in detection phase stem from partitioning data into chunks: –detection on chunk boundaries, –joining multi-chunk defects. Categorization phase: 1.Load balancing is necessary for scalability. 2.Updating catalog with new classes needs to be efficient.
12
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 12 ipdps’05 Detection Challenges
13
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 13 ipdps’05 Intuitive (un-balanced) Categorization P N M Increasing no. of nodes will increase “sequential” fraction.
14
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 14 ipdps’05 Load Balanced Categorization Approach has been tested with variable number of multi-node defects.
15
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 15 ipdps’05 Intuitive (sequential) Catalog Updates “Catalog completeness” has direct effect on scalability.
16
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 16 ipdps’05 Parallel Catalog Updates Tested with different levels of “catalog completeness”.
17
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 17 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of sequential defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.
18
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 18 ipdps’05 Experimental Results: Demonstrating Scalability Experimental results for up to 8 processing nodes. Experimental Platform: –Cluster (1-8) of 700 MHz Pentium machines –Connected through Myrinet LANai 7.0 –1 GB memory each node –Datasets ranging in size from 133 MB to 1.8 GB Breakdown of Total Execution time (1.8 GB)
19
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 19 ipdps’05 More Scalability Experiments 480 MB Dataset, 1-8 nodes Catalog completeness varies, but speedups remain near linear. More scalability experiments in paper.
20
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 20 ipdps’05 Experimental Results: Evaluating Load Balancing 480 MB, 2/3 in db480 MB, 0/3 in db Optimized scales better!
21
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 21 ipdps’05 Experimental Results: Parallel Matching Approach Default implementation performs sequential categorization of the non- matching defects. Optimized implementation: 1.parallel local catalog update, 2.merging of local catalogs on Master node, 3. finalizing local catalogs in parallel.
22
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 22 ipdps’05 Presentation Road Map Motivation for scalable datamining. Description of middleware and functionality. Description of sequential defect detection and categorization algorithm. Parallelization challenges and solutions. Experimental results. Conclusions.
23
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 23 ipdps’05 Conclusions FREERIDE can be used to parallelize scientific mining algorithms with a more complex processing structure. Scalability can be achieved with less programming effort than if a parallel application was “hand-coded”. Parallel applications created using FREERIDE allow working efficiently with disk-resident datasets. Our approaches to load balancing and to parallel categorization of non-matching defects perform better than naïve approaches to solving the posed problem.
24
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 24 ipdps’05 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.