Excursions into Parallel Programming

Slides:

Advertisements

Similar presentations

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Advertisements

Project: – Several options for bid: Bid our signal Develop several strategies Develop stable bidding strategy Simulating Normal Random Variables.

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

Optimizing the Placement of Chemical and Biological Agent Sensors Daniel L. Schafer Thomas Jefferson High School for Science and Technology Defense Threat.

With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.

Othello Artificial Intelligence With Machine Learning

Scientific Method Scientific Method Interactive Lotus Diagram By Michelle O’Malley 6 th Grade Science League Academy Work Cited Work Cited Forward.

Program Development Life Cycle (PDLC)

1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.

An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,

The Software Development Process

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

Multiplication of Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math 1 3 ×1 3 Applying.

1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.

A Brief History of AI Fall 2013 COMP3710 Artificial Intelligence Computing Science Thompson Rivers University.

Othello Artificial Intelligence With Machine Learning Computer Systems TJHSST Nick Sidawy.

Optimizing Parallel Programming with MPI Michael Chen TJHSST Computer Systems Lab Abstract: With more and more computationally- intense problems.

St Peter’s CofE Primary School

Optimization of Graphene Conductivity Under Pressure Variations Robert W. Raines Introduction The purpose of this experiment was to see if the conductive.

DNA Computing. What is it?  “DNA computing is a branch of computing which uses DNA, biochemistry, and molecular biology hardware, instead of the traditional.

Data Link Control. The two main functions of the data link layer are data link control and media access control. The first, data link control, deals with.

EXPERIMENTAL DESIGN … AND HOW TO WRITE UP A LAB IN 8 TH GRADE SCIENCE …

Sub-fields of computer science. Sub-fields of computer science.

Advanced Computer Systems

Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area

Code Optimization.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Rainbow Rays and Enormous Eggs

Chapter 2.1 CPU.

Othello Artificial Intelligence With Machine Learning

Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.

Grad OS Course Project Kevin Kastner Xueheng Hu

Object-Oriented Analysis and Design

Parallel Programming By J. H. Wang May 2, 2017.

Lesson Objectives Aims – Know about…

The University of Adelaide, School of Computer Science

Parallel Density-based Hybrid Clustering

A Relevant and Descriptive Title

Lookahead pathology in real-time pathfinding

Othello Artificial Intelligence With Machine Learning

Teaching Computing to GCSE

Teaching Computing to GCSE

Translators & Facilities of Languages

The Parameterized Poker Squares EAAI NSG Challenge

LESSON 12 - Loops and Simulations

Real Applications Infused in Technology Math

Chapter 7 Part 1 Scatterplots, Association, and Correlation

Formatting Paragraphs

Fundamentals of Programming

MSIS 655 Advanced Business Applications Programming

Create PT: Complete the Task

Writing Academic Papers In English Language Journals

CSE8380 Parallel and Distributed Processing Presentation

How to Write a good Lab Report

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.

Multithreaded Programming

Ainsley Smith Tel: Ex

Mastering Memory Modes

Title of Project Joseph Hallahan Computer Systems Lab

Analysis of Algorithms

A New Technique for Destination Choice

Secondary Analysis Method

COMP3710 Artificial Intelligence Thompson Rivers University

Experiment Design and Purpose

Advanced Operating System Maekawa vs. Ricart-Agrawala By Rizal M Nor

The Scientific Method.

Othello Artificial Intelligence With Machine Learning

Excursions into Parallel Programming

Presentation transcript:

Excursions into Parallel Programming Abstract: With more and more computationally-intense problems appearing through the fields of math, science, and technology, a need for better processing power is needed in computers. MPI (Message Passing Interface) is one of the most crucial and effective ways of programming in parallel, and despite its difficulty to use at times, it has remained the de facto standard. But technology has been advancing, and new MPI libraries have been created to keep up with the capabilities of supercomputers. Efficiency has become the new keyword in finding the number of computers to use, as well as the latency when passing messages. The purpose of this project is to explore some of these methods of optimization in specific cases like the game of life problem. Results and Conclusions: Though I was hoping to find a clear relationship between latency, run-time, and number of processors, the data that I've produced suggests otherwise. The largest problem was the fact that as I added processors, the run-time actually increased, almost linearly. However, this can be explained by the massive overhead due to passing of arrays as well as general inefficiency of my code. However, not all the results were against my original hypothesis. First is the fact that with less processors, the amount passed plays a smaller role. With 4 processors, the variation in the running time between when 1 row/column was passed vs. 5 has a difference of around 25 milliseconds. However, with 9 processors, the variation is closer to 100 milliseconds. The second was that there was a general trend for faster run speed as the amount passed neared 4, and then a gradual increase in run time again above 4. Although the results I have uncovered may be insignificant compared to the grand-scale projects that major companies are pursuing, they have much of the same basics, and will provide a basis for me in the future to work on, as well as an idea of how research is formally done. Excursions into Parallel Programming Michael Chen TJHSST Computer Systems Lab 2007-2008 Introduction: Because of the increasing demand for powerful computations, MPI has become a standard for even companies like IBM, which recently began using their BG/L supercomputer along with an MPI library to tackle problems like protein folding. Because of this, efficiency in parallel programming has become a high priority, finding what yields the best latency, and what type of processors can best suit each problem. Many hopes for the future lie in this field. Molecular biology, strong artificial intelligence, and ecosystem simulations are just a few of the multitude of applications which will surely require parallel computing. Though my plans only include optimization of the game of life problem, it is these basic skills that carry over into real-world applications, except on a much larger scale. The processes of automated testing are becoming increasingly popular, and the results that my project have yielded have shed some light on the relationship between latency, processor usage, and efficiency. Running simulation of game of life with 9 processors Procedures and Methodology: During first and second quarter, I followed along with the supercomputing class, which has been diving into parallel programming with MPI. However, nearing the end of the 2nd quarter, I decided to break off, and work more on the game of life program. The process began with writing the game of life without MPI, which was done quickly. However, making it run in parallel was the hard part, hindered by two problems: Java has been my main language for the past three years, so I run into syntax errors in C quite often, and because I encountered problems at first with sending and receiving in MPI (array sizes caused problems in the game of life). However, in the 3rd and 4th quarters, the problems instead revolved much more around file i/o and the inability of the C language to use strings effectively, which caused problems when I was attempting to automate the whole process. The theory behind the game of life with MPI is that the board will be split by the number of processors used in calculation. But each section only needs a limited amount of information from surrounding areas, and this is where message passing came into play. Each processor sends one row or one column to another computer, as well as receives it. This is also where the latency problem comes into play. If only one right/column is send a turn, then the computers must re-sync every step. But if two rows/columns are sent, the amount of re-syncing time is halved. But the efficacy of this depends on individual computer and network latency as well. A limit has to be drawn eventually. There is little use in, say, passing the entire board to each cell. (see right for diagram) Graphs and charts of the averages of multiple runs in different scenarios Diagram of latency vs. processing