Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,

Slides:

Advertisements

Similar presentations

Network II.5 simulator ..

Advertisements

Chapter 5: Tree Constructions

1 Routing Protocols I. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.

Basic Communication Operations

Presented by Dealing with the Scale Problem Innovative Computing Laboratory MPI Team.

1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

MPI Collective Communications

Decision Tree Approach in Data Mining

Chapter 15 Basic Asynchronous Network Algorithms

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

1 Complexity of Network Synchronization Raeda Naamnieh.

Decision Tree Algorithm

1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.

Scheduling Algorithms for Wireless Ad-Hoc Sensor Networks Department of Electrical Engineering California Institute of Technology. [Cedric Florens, Robert.

Basic Data Mining Techniques Chapter Decision Trees.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

Ensemble Learning: An Introduction

Communication operations Efficient Parallel Algorithms COMP308.

A Distributed Algorithm for Minimum-Weight Spanning Trees by R. G. Gallager, P.A. Humblet, and P. M. Spira ACM, Transactions on Programming Language and.

Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.

Topic Overview One-to-All Broadcast and All-to-One Reduction

P2P Course, Structured systems 1 Introduction (26/10/05)

Internetworking Fundamentals (Lecture #2) Andres Rengifo Copyright 2008.

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

SIMULATING A MOBILE PEER-TO-PEER NETWORK Simo Sibakov Department of Communications and Networking (Comnet) Helsinki University of Technology Supervisor:

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Collective Communication

Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.

 Collectives on Two-tier Direct Networks EuroMPI – 2012 Nikhil Jain, JohnMark Lau, Laxmikant Kale 26 th September, 2012.

Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.

Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.

More on Adaptivity in Grids Sathish S. Vadhiyar Source/Credits: Figures from the referenced papers.

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.

Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.

2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding.

MPI implementation – collective communication MPI_Bcast implementation.

Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.

Active Message Application: CONNECT Presented by Xiaozhou David Zhu Oommen Regi July 6, 2001.

IBM Global Services © 2005 IBM Corporation SAP Legacy System Migration Workbench| March-2005 ALE (Application Link Enabling)

2/14/2016  A. Orda, A. Segall, 1 Queueing Networks M nodes external arrival rate (Poisson) service rate in each node (exponential) upon service completion.

HYPERCUBE ALGORITHMS-1

بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.

Basic Communication Operations Carl Tropper Department of Computer Science.

DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.

Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )

Collective Communication Implementations

Planning & System installation

Collective Communication Implementations

Decision trees Polynomial-Time

Planning & System installation

A Distributed Algorithm for Minimum-Weight Spanning Trees

ID3 Vlad Dumitriu.

An Introduction to Parallel Programming with MPI

Comparison between Suzuki Kasami’s and Raymond’s Tree Algorithm

Roberto Battiti, Mauro Brunato

Collective Communication Implementations

Chapter 11 Data Compression

Raymond Exclusive Algorithm

The Selection Problem.

Synchronizing Computations

MapReduce: Simplified Data Processing on Large Clusters

CSPA Templates for sharing services

CSPA Templates for sharing services

Presentation transcript:

Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra, IPDPS(IEEE International Parallel & Distributed Processing Symposium) Reporter : Yu Tang Liu

Outline Abstract Introduction C4.5 Decision Tree algorithm Experimental Results and Analysis Conclusion

Abstract Selecting the close-to-optimal collective algorithm based on the parameters of the collective call at run time is an important step in achieving good performance of MPI applications. Explore the applicability of C4.5 decision trees to the MPI collective algorithm selection problem.

Introduction Performance of MPI collective operations depend on ◦ Total number of nodes involved in communication ◦ System and network characteristics ◦ Size of data being transferred ◦ Current load ◦ The operation that is being performed ◦ The segment size used for operation pipelining Selecting the best possible algorithm and segment size combination for every instance of collective operation.

Introduction Process of tuning a system 1.Detailed profiling of the system, possibly combined with communication modeling. 2.Analyzing the collected data and generating a decision function 3.During run-time, the decision function selects the close-to-optimal method(combination of algorithm and segment size) for a particular collective instance.

C4.5 Decision Tree Algorithm Decision Tree Example

C4.5 Decision Tree Algorithm In the decision tree each node corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute for the records described by the path from the root to that leaf. In the decision tree at each node should be associated the non-categorical attribute which is most informative among the attributes not yet considered in the path from the root.

C4.5 Decision Tree Algorithm Requirement of application of C4.5 algorithm ◦ Attribute-value description ◦ Predefined classes ◦ Discrete classes ◦ Sufficient data ◦ “Logical” classification models

C4.5 Decision Tree Algorithm Additional parameters that affect the resulting decision tree ◦ Weight ◦ Confidence level ◦ Attribute grouping ◦ Windowing

C4.5 Decision Tree Algorithm ◦ ID3 algorithm ◦ C4.5 algorithm= ID3 algorithm +

Experimental Results and Analysis C4.5 decision tree for Alltoall on Nano cluster

Experimental Results and Analysis Barrier is a collective operation used to synchronize a group of nodes. It guarantees that by the end of the operation, all processes involved in the barrier have at least entered the barrier. ◦ In flat-tree/linear algorithm all nodes report to a preselected root; once every node has reported to the root, the root sends a releasing message to all participants. ◦ In the double ring algorithm, a zero-byte message is sent from a preselected root circularly to the right. A node can leave barrier only after it receives the message for the second time. ◦ Bruck algorithm requires communication steps. At step k, node r receives a zero-byte message from and sends message to node and node (with wrap around) respectively.

Experimental Results and Analysis Alltoall is used to exchange data among all processes in a group. The operation is equivalent to all processes executing the scatter operation on their local buffer. ◦ In the linear algorithm at step i, the ith node sends a message to all other nodes. The (i+1)th node is able to proceed and start sending as soon as it receives the complete message from the ith node. We allow for segmentation of messages being sent. ◦ In the pairwise exchange algorithm, at step i, node with rank r sends a message to node (r+i) and receives a message from the (r-i)th node, with wrap around. We do not segment messages in this algorithm.

Experimental Results and Analysis The Broadcast operation transmits an identical message from the root process to all processes of the group. At the end of the call, the contents of the root’s communication buffer is copied to all other processes. ◦ In flat-tree/linear algorithm root node sends an individual message to all participating nodes. ◦ In pipeline algorithm, messages are propagated from the root left to right in a linear fashion. ◦ In binomial and binary tree algorithms, messages traverse the tree starting at the root and going towards the leaf nodes through intermediate nodes. ◦ In the splitted-binary tree algorithm, the original message is split into two parts, and the “left” half of the message is sent down the left half of the binary tree, and the “right” half of the message is sent down the right half of the tree. In the final phase of the algorithm, every node exchanges message with their “pair” from the opposite side of the binary tree. ◦ binary tree algorithm

Experimental Results and Analysis The Reduce operation combines elements provided in the input buffer of each process within a group using the specified operation, and returns the combined value in the output buffer of the root process. ◦ flat-tree/linear ◦ Pipeline ◦ binomial tree ◦ binary tree ◦ k-chain tree.

Experimental Results and Analysis

Broadcast decision tree statistics corresponding to the data presented in last figure.

Experimental Results and Analysis Performance penalty of Broadcast decision trees corresponding to the data presented in last Figure and table

Experimental Results and Analysis

Statistics for combined Broadcast and Reduce decision trees corresponding to the data presented in last figure.

Experimental Results and Analysis Mean performance penalty of the combined decision tree for each of the collectives.

Experimental Results and Analysis Segment of combined Broadcast and Reduce decision tree ‘-m 40 –c 25’

Conclusion C4.5 decision tree can be used to generate a reasonably small and very accurate decision function: the mean performance penalty on existing performance data was within the measurement error for all trees we considered. These trees were also able to produce decision functions with less than 2.5% relative performance penalty for both collectives. This indicates that it is possible to use information about one MPI collective operation to generate a reasonable well decision function for another collective.