Parallel Algorithm for Multiple Genome Alignment Using Multiple Clusters Nova Ahmed, Yi Pan, Art Vandenberg Georgia State University SURA Cyberinfrastructure.

Slides:



Advertisements
Similar presentations
SHARCNET. Multicomputer Systems r A multicomputer system comprises of a number of independent machines linked by an interconnection network. r Each computer.
Advertisements

Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
SURA NMI Utility Grid: Sharing Resources, Sharing Results SURA Cyberinfrastructure Workshop: Grid Application Planning & Implementation January 5-7, 2005.
A Grid implementation of the sliding window algorithm for protein similarity searches facilitates whole proteome analysis on continuously updated databases.
1 SC'03, Nov. 15–21, 2003 A Million-Fold Speed Improvement in Genomic Repeats Detection John W. Romein Jaap Heringa Henri E. Bal Vrije Universiteit, Amsterdam.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Mapping Genomes onto each other – Synteny detection CS 374 Aswath Manohar.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Grid Networks. SURAGrid GSU UAH UAB UMICH UVA USC TACC LSU GPN GaTech SC DUKE GMU Tulane UARK TTU See presentation on Nov 22, 2005 from SURAGrid team.
A Parallel Solution to Global Sequence Comparisons CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03.
1 Bio-Sequence Analysis with Cradle’s 3SoC™ Software Scalable System on Chip Xiandong Meng, Vipin Chaudhary Parallel and Distributed Computing Lab Wayne.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Sequence comparison: Local alignment
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
Parallel and Distributed Intelligent Systems Virendrakumar C. Bhavsar Professor and Director, Advanced Computational Research Laboratory Faculty of Computer.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Ajou University, South Korea ICSOC 2003 “Disconnected Operation Service in Mobile Grid Computing” Disconnected Operation Service in Mobile Grid Computing.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.
Research Achievements Kenji Kaneda. Agenda Research background and goal Research background and goal Overview of my research achievements Overview of.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
Chao “Bill” Xie, Victor Bolet, Art Vandenberg Georgia State University, Atlanta, GA 30303, USA February 22/23, 2006 SURA, Washington DC Memory Efficient.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
GPU Acceleration of Pyrosequencing Noise Removal Dept. of Computer Science and Engineering University of South Carolina Yang Gao, Jason D. Bakos Heterogeneous.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
SURA GridPlan Infrastructure Working Group Art Vandenberg Georgia State University Mary Fran Yafchak SURA Working.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
October 15, 2003 Art Vandenberg Internet2 Fall Member Meeting1 Taking Grids out of the Lab and onto the Campus at Georgia State University – Case Study.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Grid Networks. SURAGrid GSU UAH UAB UMICH UVA USC TACC LSU GPN GaTech SC DUKE GMU Tulane UARK TTU See presentation on Nov 22, 2005 from SURAGrid team.
Parallel Characteristics of Sequence Alignments Kyle R. Junik.
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.
On High Performance Computing and Grid Activities at Vilnius Gediminas Technical University (VGTU) dr. Vadimas Starikovičius VGTU, Parallel Computing Laboratory.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
- Divided Range Multi-Objective Genetic Algorithms -
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Clouds , Grids and Clusters
INTRODUCTION TO BIOINFORMATICS
Grid Computing.
Sequence comparison: Local alignment
Sequence comparison: Significance of similarity scores
NMI Testbed GRID Utility for Virtual Organization
Title of Poster Site Visit 2017 Introduction Results
Sequence comparison: Local alignment
Comparison to existing state of security experimentation
Title of Poster Site Visit 2018 Introduction Results
Project Title: I. Research Overview and Outcome
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Parallel Algorithm for Multiple Genome Alignment Using Multiple Clusters Nova Ahmed, Yi Pan, Art Vandenberg Georgia State University SURA Cyberinfrastructure Workshop: Grid Application Planning & Implementation January 5-7, 2005 Southeastern Universities Research Association

2 SURA Cyberinfrastructure Workshop January 5-7, 2005 Discussion Topics… Sequence alignment problem Memory efficient algorithm Convergence toward collaboration System configurations  Results (part 1, part 2)  Conclusions  Future work

Southeastern Universities Research Association 3 SURA Cyberinfrastructure Workshop January 5-7, 2005 Sequence alignment problem Sequences used to find biologically meaning relationships among organisms Evolutionary information Determining diseases, causes, cures Finding out information about proteins Problem especially compute intensive for long sequences Needleman and Wunsch (1970) - optimal global alignment Smith and Waterman (1981) - optimal local alignment Taylor (1987) - multiple sequence alignment by pairwise alignment BLAST trades off optimal results for faster computation Challenge - achieve optimal results without sacrificing speed

Southeastern Universities Research Association 4 SURA Cyberinfrastructure Workshop January 5-7, 2005 Memory efficient algorithm Based on pairwise algorithm Similarity Matrix generated to compare all sequence positions Observation that many “alignment scores” are zero value Similarity Matrix reduced by storing only non-zero elements Row-column information stored along with value Block of memory dynamically allocated as non-zero element found Data structure used to access allocated blocks Parallelism introduced to reduce computation

Southeastern Universities Research Association 5 SURA Cyberinfrastructure Workshop January 5-7, 2005 Alignment of DNA sequences: Sequence X: TGATGGAGGT Sequence Y: GATAGG 1 = matching; 0 = non-matching ss = substitution score; gp = gap score Generate Similarity Matrix max score with respect to neighbors using: Similarity Matrix Generation

Southeastern Universities Research Association 6 SURA Cyberinfrastructure Workshop January 5-7, 2005 Back trace matrix to find sequence matches Trace sequences

Southeastern Universities Research Association 7 SURA Cyberinfrastructure Workshop January 5-7, 2005 Algorithm calculates only non-zero values Memory dynamically allocated as needed Data structure

Southeastern Universities Research Association 8 SURA Cyberinfrastructure Workshop January 5-7, 2005 Parallel distribution of multiple sequences Sequences 1-6Sequences 7-12 Seq 1-2 Seq 5-6 Seq 3-4

Southeastern Universities Research Association 9 SURA Cyberinfrastructure Workshop January 5-7, 2005 Convergence toward collaboration Algorithm implementation Nova Ahmed, Masters CS student Dr. Yi Pan, CS, graduate advisor Shared memory system – Georgia State Algorithm implementation and initial validation results NMI Integration Testbed program Georgia State –Art Vandenberg, Victor Bolet, et al. University of Alabama at Birmingham –Jill Gemmill, John-Paul Robinson, Pravin Joshi SURA NMI Testbed Grid Looking for applications to demonstrate value

Southeastern Universities Research Association 10 SURA Cyberinfrastructure Workshop January 5-7, 2005 System configurations Shared memory – Georgia State SGI Origin 2000 –24 250MHz MIPS R10000; 4 gigabytes total RAM Clusters – University of Alabama at Birmingham Single Cluster –8 node Beowulf cluster (each node 4 550MHz Pentium III; 512 MB RAM) Single Cluster Grid –Same 8 node Beowulf cluster with Globus Toolkit 3.0 Multi-Cluster –2 additional grid-enabled clusters (small SMP systems) Multi-Cluster interconnect speed essentially 100mb/sec

Southeastern Universities Research Association 11 SURA Cyberinfrastructure Workshop January 5-7, 2005 Results, part 1 Initial validation of algorithm on Shared memory UAB Cluster As “relative comparison” to shared memory performance UAB grid-enabled cluster To evaluate impact of grid middleware layer

Southeastern Universities Research Association 12 SURA Cyberinfrastructure Workshop January 5-7, 2005 Initial Validation: Shared Memory Machine Performance Validates Algorithm Computation time decreases with increased number of processors Limitations Memory Max sequence is 2000 x 2000 Processors Policy limits student to 12 processors Not scalable

Southeastern Universities Research Association 13 SURA Cyberinfrastructure Workshop January 5-7, 2005 Results: UAB Clusters; Shared Memory* Increase genome lengths to 3000 (remove student limit shared memory) * NB: results comparing clusters with shared memory are relative; Systems distinctly different.

Southeastern Universities Research Association 14 SURA Cyberinfrastructure Workshop January 5-7, 2005 Results: Grid-enabled cluster (Globus, MPICH) Advantages of grid-enabled cluster: Longer Sequences – up to 10,000 length tested Scalable – Can add new cluster nodes to the grid Easier job submission – Don’t need account on every node Scheduling is easier – Can submit multiple jobs at one time

Southeastern Universities Research Association 15 SURA Cyberinfrastructure Workshop January 5-7, 2005 Results, part 2 Focus on clusters UAB Cluster UAB grid-enabled cluster Multi-clusters at UAB Multiple Genome alignment – not just pairwise Sequence set from sequence library Approx 150 sequences ranging from 80,000 to 1,000,000 length Globus Toolkit 3.0, MPICH-G2

Southeastern Universities Research Association 16 SURA Cyberinfrastructure Workshop January 5-7, 2005 Computation Time Number of elements per processor Using 9 processors in each config (cluster, grid cluster, multi-grid cluster)

Southeastern Universities Research Association 17 SURA Cyberinfrastructure Workshop January 5-7, 2005 Computation Time 9 processors available in multi-cluster 32 processors for other configs.

Southeastern Universities Research Association 18 SURA Cyberinfrastructure Workshop January 5-7, 2005 Speed up (time 1 cpu / time n cpus) 9 processors available in multi-cluster 32 processors for other configs.

Southeastern Universities Research Association 19 SURA Cyberinfrastructure Workshop January 5-7, 2005 Some Conclusions Having cluster nodes available via Testbed beneficial Enables access where resource not available locally Empowers student investigation Grid capability demonstrated Provides awareness and outreach vector Nova Ahmed’s thesis defense - engages other graduate students Concrete “take away” that engages faculty/IT/student discussion Some interesting results Hypothesis: multi-cluster may provide better results than one cluster Research leads to understanding, learning - whatever Hypothesis result Ahmed et al., “Memory Efficient Pair-Wise Genome Alignment Algorithm - A Small-Scale Application with Grid Potential,” Proceedings Grid and Cooperative Computing - GCC 2004, Lecture Notes in Computer Science

Southeastern Universities Research Association 20 SURA Cyberinfrastructure Workshop January 5-7, 2005 Future Work Running across clusters at different sites Intelligent agent: submit to mixed environment – shared memory and/or clusters and/or … Using BridgeCA for transparent access Optically connected clusters? Analysis of network factors cf. Warren Matthews, GaTech, et al., end-to-end performance

Southeastern Universities Research Association 21 SURA Cyberinfrastructure Workshop January 5-7, 2005 Questions / Contacts Georgia State University Nova Ahmed Yi Pan Art Vandenberg

Southeastern Universities Research Association 22 SURA Cyberinfrastructure Workshop January 5-7, 2005 Acknowledgement This work is supported in part by the NSF Middleware Initiative Cooperative Agreement No. ANI Any opinions, findings, conclusions or recommendations expressed herein are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.