Chen Jin (HT016952H) Zhao Xu Ying (HT016907B)

Slides:



Advertisements
Similar presentations
SPEC OMP Benchmark Suite H. Saito, G. Gaertner, W. Jones, R. Eigenmann, H. Iwashita, R. Lieberman, M. van Waveren, and B. Whitney SPEC High-Performance.
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.
PRASHANTHI NARAYAN NETTEM.
FLANN Fast Library for Approximate Nearest Neighbors
The hybird approach to programming clusters of multi-core architetures.
Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters Nikolaos Drosinos and Nectarios Koziris National Technical.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Computer System Architectures Computer System Software
Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
I-SPAN’05 December 07, Process Scheduling for the Parallel Desktop Designing Parallel Operating Systems using Modern Interconnects Process Scheduling.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
S AN D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE On pearls and perils of hybrid OpenMP/MPI programming.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
DISTRIBUTED COMPUTING
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
2.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition System Programs (p73) System programs provide a convenient environment.
Background Computer System Architectures Computer System Software.
Yinglei Cheng1,2, Ying Li2, Rongchun Zhao2 2010/01/07 黃千峰 A Parallel Image Fusion Algorithm Based on Wavelet Packet.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
TensorFlow– A system for large-scale machine learning
Database Recovery Techniques
Last Class: Introduction
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Processing
Resource Management IB Computer Science.
Module 11: File Structure
Distributed Shared Memory
Tohoku University, Japan
Chapter 11: Storage and File Structure
Parallel Density-based Hybrid Clustering
Multiprocessor Cache Coherency
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
Chapter 9: Virtual-Memory Management
Chapter 2: System Structures
Unit-2 Divide and Conquer
Outline Midterm results summary Distributed file systems – continued
CSE8380 Parallel and Distributed Processing Presentation
Operating System 4 THREADS, SMP AND MICROKERNELS
Hybrid Programming with OpenMP and MPI
Multithreaded Programming
Outline Chapter 2 (cont) OS Design OS structure
Clustering Wei Wang.
Data Pre-processing Lecture Notes for Chapter 2
The Gamma Database Machine Project
Presentation transcript:

Chen Jin (HT016952H) Zhao Xu Ying (HT016907B) A Programming Model of Hybrid Distributed/Shared Memory System in Data Mining Field Chen Jin (HT016952H) Zhao Xu Ying (HT016907B) 9/12/2018

Outlines Introduction Mixed-mode Parallel Programming Fuzzy c-Medoids Algorithm (FCMdd) Serial Version of FCMdd MPI Version of FCMdd Mixed-mode Version of FCMdd Comparisons between Pure MPI & Mix-mode Result Analysis Conclusion

Introduction This report discusses the benefits of developing mixed-mode MPI/OpenMP applications on clustered SMPs in data mining field, especially focusing on web log clustering. We start with a realistic serial C++ FCMdd program. Next we show some of the modifications that were performed to the program to enable it to run using MPI

Introduction (Contd) Then we show the simple modifications that were performed to the source to take advantage of OpenMP. We show that using a combination of MPI and OpenMP can be an effective method of programming hybrid systems.

Mixed-mode Parallel Programming Shared-memory architectures are gradually becoming more prominent Advances in technology have allowed larger numbers of CPUs to have access to a single memory space But it is not immediately clear that message passing is the most efficient parallelization technique within an SMP box. Message-passing codes written in MPI are obviously portable and should transfer easily to clustered SMP systems. In theory a shared memory model such as OpenMP should be preferable.

Mixed-mode Cluster Hierarchical Model

Mixed-mode Parallel Programming (Contd) Hypothesis: A combination of shared-memory and message-passing parallelization paradigms within the same application may provide a more efficient parallelization strategy than pure MPI.

FCMdd Algorithm Description Web mining can be viewed as the extraction of structure from unlabeled semistructured data containing information. Three operations of particular interest: Clustering – finding natural groupings of users, pages. Associations – URLs tend to be requested together. Sequential analysis – URLs tend to be accessed.

Fuzzy c-Medoids Algorithm(FCMdd) Dissimilarity: Objective function of FCMdd (1) (2)

The Role of FUZZY Granularity in Web Mining The categories and associations in Web mining do not have crisp boundaries. They overlap considerably and are best described by fuzzy sets. Bad exemplars (outliers) and incomplete data can easily occur in the data set

Serial Version of FCMdd Fix the number of clusters c: Set iter = 0; Pick initial medoids from Repeat Compute memberships for and by using (2) and identify (A) Store the current medoids: Compute the new medoids for (B) Until ( or )

Properties of FCMdd Dealing with data which size is extremely large Time complexity is O(n*c*p), where n is the data size

MPI Version of FCMdd Read Data from Disk to File Different node reads different parts of file, then broadcast to others. Different node reads the same file separately. A specified node reads all the data then broadcasts to the other nodes

MPI Version of FCMdd(Contd) Choose Medoids from sessions The kernel of FCMdd is the replacement of original medoid by appropriate candidates. Medoids can be considered separately in each iteration. Carbine the medoids parts together to get the complete array.

MPI Version of FCMdd(Contd)

Performance Analysis of MPI mode Record number is 138,384: 1 node : 15.1244s 2 nodes : 7.7100s 4 nodes : 6.3486s 8 nodes : 5.1482s

Performance Analysis of MPI mode(Contd) Record Number is 1,882,384: 1 node : 68.3798s 2 nodes : 33.6380s 4 nodes : 26.1140s 8 nodes : 21.5922s

Mix-mode version of FCMdd Two kinds of Mix-mode programming Models: MPI parallelization occurs across hosts at the top level, and OpenMP parallelization occurring below within each node. Environment variable OMP_NUM_THREADS can be set for each host, which may be different or equivalent to set differently for each host. MPI and OpenMP parallelization occurs within a host. Environment variable can be used to set the number of OpenMP threads for all MPI processes and all MPI processes will use the same number of threads. If each MPI process needs different number of threads, omp_set_num_threads should be called on each MPI process.

Mix-mode version of FCMdd(Contd) The left figure corresponds to the parallelization for a cluster of uniprocessor nodes, where a MPI process is allocated on each node.

Mix-mode version of FCMdd(Contd) The right figure shows how the computation part of the process is split into threads by using OpenMP directives. P is the part of code that can not be parallelized with OpenMP and Pn is the OpenMP parallel part.

Mix-mode version of FCMdd(Contd)

Mix-mode version of FCMdd(Contd)

Comparisons Between the MPI and Mix-mode Programming we tested on hydra within one node by varied database size

Comparisons Between the MPI and Mix-mode Programming

Comparisons Between the MPI and Mix-mode Programming Data size is 610K in Beowulf:

Comparisons Between the MPI and Mix-mode Programming

Comparisons Between the MPI and Mix-mode Programming Data size is 8M on Beowulf:

Comparisons Between the MPI and Mix-mode Programming

Comparisons Between the MPI and Mix-mode Programming (Contd) The two modes have similar performance. However, there are still differences.

Comparisons Between the MPI and Mix-mode Programming (Contd) Within one host, Mix-mode shows more potential to improve efficiency than pure MPI programming. Our test result in hydra shows that inside one host, Mix-mode is better than pure MPI mode. In theory, OpenMP makes better usage of shared memory architecture in pure shared memory mechanism.

Comparisons Between the MPI and Mix-mode Programming (Contd) In a hybrid shared/distributed memory system, a mix mode implementation could be more efficient for large problems giving poor scalability for a large number of processors.

Comparisons Between the MPI and Mix-mode Programming (Contd) In theory, a mixed mode code, with the MPI parallelization occurring across the SMP hosts and OpenMP parallelization within the hosts, should be more efficient on an SMPs cluster as the model matches the architecture more closely than a pure MPI model. The test results in beowulf also support our conclusion.

Conclusion Hypothesis: A combination of shared-memory and message-passing parallelization paradigms within the same application may provide a more efficient parallelization strategy than pure MPI. Our project supports the hypothesis successfully.

Contribution Chen Jin Zhao XuYing Data Preprocessing Serial version MPI version MIX version Comparison

Reference [1] Franck Cappello and Daniel Etiemble. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks. In 0-7803-9802-5/2000 (c) 2000 IEEE. [2] L.A.Smith. Mixed Mode MPI/OpenMP Programming. In Edinburgh Parallel Computing Centre, Edinburgh, EH93 JZ [3] Franck Cappello, Olivier Richard and Daniel Etiemble. Investigating the performance of two programming models for clusters of SMPPCs. In High-Performance Computer Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on , 1999 [4] Piero Lanucara and Sergio Rovida. Conjugate-Gradients Algorithms: An MPI-OpenMP Implementation on. In CASPUR, c/o Universit`a“LaSapienza”, P. Aldo Moro 2,00185 Roma, Italy and Istituto Analisi Numerica-C.N.R., via Ferrata1, 27100 Pavia, Italy [5] OpenMP, OpenMP C and C++ Application Program Interface. Http: // www.openmp.org/. [6] North Carolina Supercomputing Center. A Standard Shared Memory Parallel Computing. http://www.ncsc.org/training/materials/openmp/sginotes/sld001.htm

END