1 Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications Mark K. Gardner (Virginia Tech) Wu-chun Feng (Virginia.

Slides:

Advertisements

Similar presentations

Configuration management

Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

SLA-Oriented Resource Provisioning for Cloud Computing

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing P. Balaji, Argonne National Laboratory W. Feng and J. Archuleta, Virginia Tech.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.

Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.

A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.

Operating System Support Focus on Architecture

TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation.

Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.

Parallel Computation in Biological Sequence Analysis: ParAlign & TurboBLAST Larissa Smelkov.

Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.

07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.

Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.

Ch 4. The Evolution of Analytic Scalability

1 The Google File System Reporter: You-Wei Zhang.

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.

Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.

Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.

Computer Science Department of 1 Massively Parallel Genomic Sequence Search on Blue Gene/P Heshan Lin (NCSU) Pavan Balaji.

1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.

Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.

1 Data structure:Lookup Table Application:BLAST. 2 The Look-up Table Data Structure A k-mer is a string of length k. A lookup table is a table of size.

1 Efficient Data Handling in Large-Scale Sequence Database Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU and ORNL) Wu-chun Feng (LANL  VT) Al Geist (ORNL)

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:

An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.

RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.

Full and Para Virtualization

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.

Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.

Semantics-based Distributed I/O for mpiBLAST P. Balaji ά, W. Feng β, J. Archuleta β, H. Lin δ, R. Kettimuthu ά, R. Thakur ά and X. Ma δ ά Argonne National.

CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.

Tackling I/O Issues 1 David Race 16 March 2010.

Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

If you have a transaction processing system, John Meisenbacher

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

PHD Virtual Technologies “Reader’s Choice” Preferred product.

Applying Control Theory to Stream Processing Systems

Grid Computing.

Introduction to Operating System (OS)

Genomic Data Clustering on FPGAs for Compression

Chapter 1: Introduction

EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.

湖南大学-信息科学与工程学院-计算机与科学系

Ch 4. The Evolution of Analytic Scalability

Computer Evolution and Performance

Parallel System for BLAST

Database System Architectures

Fast Accesses to Big Data in Memory and Storage Systems

Presentation transcript:

1 Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications Mark K. Gardner (Virginia Tech) Wu-chun Feng (Virginia Tech) Jeremy Archuleta (U. Utah) Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Nominated for Best Paper Award, SC 2006, Tampa, FL

2 Overview StorCloud Demo of SC|05 I/O throughput competition of real world scientific applications When: Sun., Nov. 13 to Thu., Nov. 17, 2005 Part of slides modified from StorCloud presentation “mpiBLAST on the GreenGene Distributed Supercomputer” (Wu Feng et. al.) Story Built an ad-hoc grid (GreenGene) with 3048 Processor for intensive genomic sequence search (search NT against NT with mpiBLAST) Team Institutions LANL, NCSU, U. Utah, and Virginia Tech Vendors Intel, Panta Systems, and Foundry Networks

3 GreenGene Grid How? Intel (Dupont) SC2005 Showroom Floor U.Utah Va Tech

4 Outline About BLAST and mpiBLAST Motivation Planning Estimate resource requirements What kind of grid do we need System design Hardware architecture Software architecture Results Conclusion

5 What is BLAST? Basic Local Alignment Sequence Tool Ubiquitous sequence database search tool used in molecular biology Given a query DNA or amino-acid (AA) sequence, BLAST Finds similar sequences in database Reports statistical significance of similarities between query and database Newly sequenced genomes are typically BLAST- searched against database of known genes Similar sequences may have similar functions in a new organism

6 BLAST at the Core of Sequence DB Search Widely used: Approximately 75%-90% of all compute cycles in life sciences are devoted to BLAST searches But, it is: Computationally demanding, O(n 2 ) (variant of string matching algorithm) Requires seq database to be stored in memory to perform efficiently Challenge: sequence databases growing exponentially

7 mpiBLAST Algorithm: Querying the Database Open source BLAST parallelization (developed at LANL) Parallel approach: segment and distribute database across cluster Advantage: deliver super-linear speedup by avoiding repeated I/O Limitation: poor performance in handle search with large output volume because of results merging bottleneck

8 mpiBLAST-PIO: Enhancing Efficiency Optimizations transferred from pioBLAST Research prototype developed at NCSU and ORNL [Lin et. al. IPDPS05] Dramatically improves search throughput and scalability Using parallel I/O techniques to remove result merging bottleneck Results buffered and outputted concurrently by workers Enhancing output processing to reduce communication volume Largely used in SC StorCloud demo

9 Why Sequence-Search the NT Database Against Itself? From a Biological Perspective Aids in understanding of which genetic codes are unique and which are redundant Enables a number of useful studies from organism “barcoding” to gene function and evolution From a Computer Science Perspective Provides pertinent demonstration of mpiBLAST/pio’s scalability to larger problems (NT is one of the largest seq databases) Can potentially generate huge output data Enables realization of advanced indexing structure that tracks relationships among sequences in the database Such indexing structures can provide Up to 100x speedup in search times with little loss of sensitivity. Up to 20x compression of the database using phylogenetic methods.

10 Resource Estimation Why do we care? To evaluate the feasibility of the project To make better scheduling decision What’s the complexity of the problem? Intuitively: estimation by seq length NT composition

11 Sequence Length Based Estimation Simple linear extrapolation appears “mission impossible” Because of “hard queries” intensive computation, large quantities of intermediate results Fortunately, Weak correlation between sequence length and resource requirements because of BLAST employs heuristics G1 sequences well behaved, large portion of sequences belong to G1 Search of hard queries can be speeded up with more memory Sampling NT sequences search

12 Better Predictor? Hit-based rather than length-based? Two phase BLAST search First phase: find hits in word level Second phase: extend matched words in both direction to find maximal segment pair (longest local matching substring) Computation of first phase much less expensive then that of second phase Modified BLAST algorithm to collect number of hits in the first phase Attractive: utilizing internal knowledge of BLAST algorithm

13 Number of Hits Not a Better Predictor Linear regression on data collected from 500 seqs Y: output size, execution time; X: length, # hits Number of hits not necessary better Difference of mean square errors < 5% High correlation (0.9942) between number of hits and sequence length Sequence length is much easier to collect

14 What Kind of Grid Do We Need? Existing grid frameworks (such as Globus) not what we want Not available or well tested on Mac OS X and 64-bit Linux OS mpiBLAST-PIO not ported to Globus High learning curve for installation and configuration Home made grid software wrote from scratch Just fit our needs Easy to deploy, allow full control

15 Hardware Architecture Heterogeneous environment Interoperability is big concern ClusterOrganizationArchitectureMemory#ProcsFile System System XVirginia TechDual 2.3GHz PowerPC 970FX 4GB2200NFS TunnelArchUniv. of UtahDual AMD Opteron 240 CPU4GB126PVFS TunnelArchUniv. of UtahDual AMD Opteron 244 CPU2GB128PVFS DuponIntelQuad coreN/A512/25 6 NFS JarrelIntelDual 3.4GHz Intel P42GB20NFS Blade Center IntelDual 2.66GHz Intel Xeon2GB28NFS PantaPanta Systems Four AMD Opteron 246HE2GB32NFS

16

17 Software Architecture Hierarchical design SuperMaster: assign queries, fetch results, load balancing GroupMaster: fetch queries, perform search How to choose group size? Challenges: heterogeneity, scalability, fault tolerance NT Replica GroupMaster SuperMaster NT Replica

18 Heterogeneity And Accessibility Only use four existing, cross-platform tools Perl, ssh, rsync, bash 5 scripts, totaled only 458 lines Fast deployment in Unix like systems Customize mpiBLAST-PIO System X need special care Porting issues because of Mac OS and Power PC Implement pseudo-parallel-write to improve output performance on NFS

19 Design for Scalability Managing thousands of procs efficiently with loosely coupled, hierarchical design Reduce loads on SuperMaster Passive SuperMaster: easy to add group masters, regroup processors, and avoid security hole Allow incremental system start Hiding WAN latency by queuing queries in local Prevent “bubbles in the pipeline” Ensuring data integrity with MD5 checksum A silent error every 500GB [Paxson 1999] Alleviating network bandwidth constraint with compression (compression ration 1:5 ~ 1:7)

20 Fault Tolerance Serious: mean time failure < 10 hours in machines with thousands of processors [Reed 2004] Re-execution rather than checkpoint-restart Primary issue: query states management Maintain all query states in file system

21 Results Finished 1/7 NT in one day Coalesced sequences into batches targeting 30 minutes search time Execution statistics Output size: 600K ~ 7GB per batch, 284.2KB per seq Execution time: 6 secs ~ 1.6 hours, average 9 mins per batch

22 Conclusion Not be able to take advantage of existing grid software Home made grid software did work Enables rapid development and deployment Portable to Unix like platforms Identify hard queries for bio research Future work Extend framework to support more general applications Better resource estimation