Chao “Bill” Xie, Victor Bolet, Art Vandenberg Georgia State University, Atlanta, GA 30303, USA February 22/23, 2006 SURA, Washington DC Memory Efficient.

Slides:



Advertisements
Similar presentations
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
Natural Resources Canada Ressources naturelles Canada Canadian Forest Service Service canadien des forêts Conseil national de recherches Canada National.
Case Studies in Identity Management for Scientific Collaboration 2014 Technology Exchange Jim Basney CILogon This material is.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Evan Greer, Mentor: Dr. Marcelo Kobayashi, HARP REU Program August 2, 2012 Contact: globalwindgroup.com.
Introduction  Data movement is a major bottleneck in data-intensive high performance computing  We propose a Fusion Active Storage System (FASS) to address.
Job Submission on WestGrid Feb on Access Grid.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Mapping Genomes onto each other – Synteny detection CS 374 Aswath Manohar.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Assignment 3 Using GRAM to Submit a Job to the Grid James Ruff Senior Western Carolina University Department of Mathematics and Computer Science.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Parallel Implementation of the Inversion of Polynomial Matrices Alina Solovyova-Vincent March 26, 2003 A thesis submitted in partial fulfillment of the.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Nadia LAJILI User Interface User Interface 4 Février 2002.
SURA GridPlan Infrastructure Working Group Art Vandenberg Georgia State University Mary Fran Yafchak SURA Working.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
October 15, 2003 Art Vandenberg Internet2 Fall Member Meeting1 Taking Grids out of the Lab and onto the Campus at Georgia State University – Case Study.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
CCS Overview Rene Salmon Center for Computational Science.
Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.
Computing the Smith-Waterman Algorithm on the Illinois Bio-Grid Dave S. Angulo 1, Nigel M. Parsad 2, Tom Goodale 3, Gabrielle Allen 3, Ed Seidel 3 1 The.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
Parallel Algorithm for Multiple Genome Alignment Using Multiple Clusters Nova Ahmed, Yi Pan, Art Vandenberg Georgia State University SURA Cyberinfrastructure.
February 22-23, Washington D.C. SURA ENDyne Software for Dynamics of Electrons and Nuclei in Molecules. Developed by Dr. Yngve Öhrn and Dr. Erik Deumens,
08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
CSS497 Undergraduate Research Performance Comparison Among Agent Teamwork, Globus and Condor By Timothy Chuang Advisor: Professor Munehiro Fukuda.
Thinking in Parallel - Introduction New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
CS 221 – May 22 Timing (sections 2.6 and 3.6) Speedup Amdahl’s law – What happens if you can’t parallelize everything Complexity Commands to put in your.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Discussion and Conclusion
Parallelized Analysis Using Subdural Interictal EEG
Parallel Inversion of Polynomial Matrices
NMI Testbed GRID Utility for Virtual Organization
Title of Poster Site Visit 2017 Introduction Results
COMP60621 Fundamentals of Parallel and Distributed Systems
Title of Poster Site Visit 2018 Introduction Results
COMP60611 Fundamentals of Parallel and Distributed Systems
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Project Title: I. Research Overview and Outcome
Presentation transcript:

Chao “Bill” Xie, Victor Bolet, Art Vandenberg Georgia State University, Atlanta, GA 30303, USA February 22/23, 2006 SURA, Washington DC Memory Efficient Pairwise Genome Alignment Algorithm – A Small-Scale Application with Grid Potential

Introduction Small scale application is studied in the grid environment Performances are compared with shared memory environment, grid environment and cluster environment Pairwise sequence alignment program is chosen as a small scale application The basic algorithm is modified to a memory efficient algorithm The parallel implementation for pairwise sequence alignment is studied in different environments Based on work done by Nova Ahmed, NMI Integration Testbed

Specification of the Distributed Environments Shared Memory environment is a SGI ORIGIN 2000 machine with 24 CPUs Cluster environment at UAB was a beowulf cluster with 8 homogenous nodes, each node with four 550 MHz Pentium III processors with 512 MB of RAM Grid environment is the same beowulf cluster of the cluster environment with the Globus Toolkit software layer over it. Summer 2005 USC HPC resources used

Two dimensional array - Similarity Matrix - stores the two sequences A match or a mismatch is calculated for each position in the pair of sequences to be matched Dynamic programming is used The Basic Pairwise Sequence Alignment Algorithm

The Reduced Memory Algorithm Keep only nonzero elements of the matrix Memory dynamically allocated as required New data structure for efficiency The Parallel Method The genome sequences are divided among processors The Similarity Matrix is divided among processors P1P2P3P4P5 Part being computed Computation completed P i sends Edge value to P i+1 Time

Results Computation time: Shared Memory, Cluster, Grid-enabled Cluster environment Computation time: Cluster, Grid- enabled Cluster environment

Comparison of speed up: Shared Memory, Cluster, and Grid-enabled Cluster environment Comparison of speed up: Cluster, and Grid-enabled Cluster environment Results

UAB multi-cluster (a) Computation time (b) Speedup Comparison of multi-Cluster Grid environments

Running Example (per Nova Ahmed, UAB Beowulf Cluster: Medusa) Here the steps of running the genome alignment program for grid. First the sample program which aligns a very small genome sequence is tested. The genome sequences were t1.txt, t2.txt The object file is: ar7

Grid-proxy-init, RSL script, globusrun 1. First the grid-proxy-init is run to get the grid certificate Your identity: /O=Grid/OU=UAB Grid/CN=Nova Ahmed Enter GRID pass phrase for this identity: Creating proxy Done Your proxy is valid until: Fri Apr 9 00:54: Then create the RSL script in genome.rsl to run the job & (count=4) (executable=/home/nova/ar7) (jobtype=mpi) 3. the actual program ran on the grid using globus run command globusrun -s -r medusa.lab.ac.uab.edu -f./genome.rsl

Output NOVA1 MyId = 1 NumProc = 4 [1 : 1 ->2 2] [1 : 2 ->13 3] [1 : 3 ->1 1] [1 : 3 ->11 1] myid = 1 finished NOVA1 MyId = 2 NumProc = 4 [2 : 0 ->1 1] [2 : 0 ->11 1] [2 : 2 ->1 1] [2 : 3 ->2 2] [2 : 4 ->2 2] [2 : 4 ->13 3] [2 : 5 ->1 1] [2 : 5 ->13 3] myid = 2 finished NOVA1 MyId = 3 NumProc = 4 [3 : 0 ->11 1] [3 : 0 ->21 1] [3 : 1 ->2 2] [3 : 2 ->11 1] [3 : 2 ->31 1] [3 : 3 ->1 1] [3 : 4 ->1 1] [3 : 4 ->12 2] [3 : 4 ->21 1] [3 : 5 ->2 2] [3 : 5 ->12 2] [3 : 5 ->23 3] [3 : 5 ->31 1] myid = 3 finished NOVA1 MyId = 0 NumProc = 4 tgatggaggt gatagg [0 : 0 ->11 1] [0 : 2 ->1 1] [0 : 4 ->11 1] [0 : 5 ->11 1] Elapsed time is = myid = 0 finished // Running the program using longer genome sequences a1-1000, a1-2000, a compared with a2-1000, a2-2000, a2-3000

USC HPC – Summer 2005 (a) for small set sequences (b) for long set sequences Computation time in Cluster and Grid environment varying number of processors

USC HPC – Summer 2005 (a) for small set sequences (b) for long set sequences Speed up in the Cluster and Grid environments

Conclusion Grid environment shows similar performance to cluster environment Grid environment adds little overhead Shared memory environment has better speedup performance compared to cluster and grid Shared memory environment shows the limitation of memory for computing large genome sequences Small scale applications (as well as large scale) can run efficiently on a grid Distributed applications with minimal communication among the processors will see benefit in a grid environment – perhaps even across multiple clusters

Future Work Additional work in a SURAgrid environment that includes multiple clusters Test data that provides a more computation intensive challenge for grid environments Adapt the application to the grid environment such that is is using less inter-process communication

Acknowledgements This material is based in part upon work supported by: –National Science Foundation under Grant No. ANI NMI Integration Testbed Program. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF) –SURA Grant SURA SURAgrid Application Development & Documentation Thanks to –Nova Ahmed, currently Georgia Tech Computer Science PhD program, for original work carried out as part of NMI Integration Testbed Program –John-Paul Robinson and University of Alabama at Birmingham for access to medusa cluster –Jim Cotillier, Shelley Henderson, University of Southern California, for access to HPC resources –Chao “Bill” Xie, Georgia State Computer Science PhD program, for continuing Nova Ahmed’s work –Victor Bolet, Georgia State Information Systems & Technology Advanced Campus Services unit, for support of Georgia State’s SURAgrid nodes –John McGee, RENCI.org, for discussions of approach using globus