Cluster Computing Applications Project: Parallelizing BLAST The field of Bioinformatics needs faster string matching algorithms. What Exactly is BLAST?

Slides:



Advertisements
Similar presentations
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Weigh-in-Motion (WIM) with Rational Rose Sabrina A. Phillips Mississippi Valley State University.
Advertisements

Attack Graphs for Proactive Digital Forensics Tara L. McQueen Delaware State University Louis P. Wilder Computational Sciences and Engineering Division.
I would like to thank Louis P. Wilder and Dr. Joseph Trien for the opportunity to work on this project and for their continued support. The Research Alliance.
First Lego League of Tennessee Quentoria Leeks Fisk University Research Alliance in Math and Science Computer Applications and Web Technologies Networking.
Presented to George Seweryniak Mathematical, Information, and Computational Sciences Erin A. Lennartz Virginia Polytechnic Institute and State University.
A Grid implementation of the sliding window algorithm for protein similarity searches facilitates whole proteome analysis on continuously updated databases.
Managed by UT-Battelle for the Department of Energy 1 Mathematical Modeling of Fatty Acid Oxidation in Skeletal Muscle Cells Sheds New Light on Obesity.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
A Massively Parallel Architecture for Bioinformatics Presented by Md Jamiul Jahid.
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
Bioinformatics and Phylogenetic Analysis
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.
Application of robotics methods to Neutron and Synchrotron diffraction instrumentation Jon James, Nov 2008 Department of Design, Development, Environment.
The Evaluation of an Embedded System for First Responders Nicholas Brabson The University of Tennessee David Hill Computational Sciences and Engineering.
Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… John L. Mugler Stephen L. Scott Oak Ridge National Laboratory.
Weigh-in-Motion User Manual For WIM Integrated System Cindy Lopez City University of New York – York College Research Alliance in Math and Science Computational.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Methods  OpenGL Functionality Visualization Tool Functionality 1)3D Shape/Adding Color1)Atom/element representations 2)Blending/Rotation 2)Rotation 3)Sphere.
Tiffany M. Marshall Saint Mary-of-the-Woods College Mentors : Tim McKnight Measurement Science and Systems.
BLAST What it does and what it means Steven Slater Adapted from pt.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Nanoscale Electronics / Single-Electron Transport in Quantum Dot Arrays Dene Farrell SUNY.
Integrating Visualization Peripherals into Power-Walls and Similar Tiled Display Environments James Da Cunha Savannah State University Research Alliance.
Open Source Cluster Applications Resources. Overview What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
The Effects of Radio Propagation in the Workplace Carolyn Jo Shields Research Alliance in Math and Science Information Technology Services Division, Oak.
United States Grid Security and Reliability Control in High Load Conditions Christopher Lanclos—Mississippi Valley State University Research Alliance in.
OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Parallel Solution of 2-D Heat Equation Using Laplace Finite Difference Presented by Valerie Spencer.
POSTER TEMPLATES BY: Meta data - data that provides information about data.Meta data - data that provides information about.
Introduction Relationship between climate and health widely studied Climatic temperature stress increases cardiovascular disease risk Solar UV radiation.
Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… Stephen L. Scott Oak Ridge National Laboratory Computer.
Managed by UT-Battelle for the Department of Energy 1 Advanced Brain-Wave Analysis For Early Diagnosis of Alzheimer’s Disease (AD) Presented by Jaron Murphy.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
Managed by UT-Battelle for the Department of Energy 1 Integrated Catalogue (ICAT) Auto Update System Presented by Jessica Feng Research Alliance in Math.
A Comparative Analysis of Centrosome and Soma Migration in Neurons Rachel Boerner Calvin College Richard Ward Computational Sciences and Engineering Ryan.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Parallel Solution of the 3-D Laplace Equation Using a Symmetric-Galerkin Boundary Integral.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY A Comparison of Methods for Aligning Genomic Sequences Ja’Nera Mitchom Fisk University Research.
Parametric Study of Mechanical Stress in Abdominal Aortic Aneurysms (AAA) Erin A. Lennartz Virginia Polytechnic Institute and State University Research.
Managed by UT-Battelle for the Department of Energy Flux Coupling Machines and Switched Reluctance Motors to Replace Permanent Magnets in Electric Vehicles.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Computational Sciences & Engineering Division Geographic Information Science and Technology Landsat LIDAR data Hi-res satellite imagery sensor networks.
METHODS CT scans were segmented and triangular surface meshes generated using Amira. Antiga and Steinman’s method (2004) for automatically extracting parameterized.
Hormone Replacement Therapy: Friend or Foe? A Retrospective Study for Prospective Research Research Alliance in Math and Science Computational Sciences.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department.
Using TEL’s Expanded Academic ASAP Christa Lewis IS 551 December 5, 2006.
CCSM3 / HadCM3 Under predict precipitation rate near equator regions CCSM3 under predicts greater in SE U.S. than HadCM3 Methodology and Results Interpolate.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Advanced Brain-Wave Analysis For Early Diagnosis of Alzheimer’s Disease (AD) Jaron Murphy The Ohio State University Research Alliance in Math and Science.
Dr. Jacob Barhen Computer Science and Mathematics Division.
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
Managed by UT-Battelle for the Department of Energy DOE & Data Bruce Wilson Oak Ridge National Laboratory.
Managed by UT-Battelle for the Department of Energy 1 Decreasing the Artificial Attenuation of the RCSIM Radio Channel Simulation Software Abigail Snyder.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Parallelization of a Non-Linear Analysis Code Lee Hively and Jim Nutaro (mentors) Computational Sciences and Engineering Travis Whitlow Research Alliance.
Performance Comparison of Winterhawk I and Winterhawk II Systems Patrick H. Worley Computer Science and Mathematics Division Oak Ridge National Laboratory.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Managed by UT-Battelle for the Department of Energy 1 United States Grid Security and Reliability Control in High Load Conditions Presented to Associate.
Source Localization in a Moving Sensor Field Acknowledgements A special thanks to my mentor Dr. Jacob Barhen for his assistance through the duration of.
A U.S. Department of Energy laboratory managed by UChicago Argonne, LLC. Introduction APS Engineering Support Division –Beamline Controls and Data Acquisition.
OSCAR Symposium – Quebec City, Canada – June 2008 Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
Regression Testing for CHIMERA Jessica Travierso Austin Peay State University Bronson Messer National Center for Computational Sciences August 2009.
Regression Testing for CHIMERA Jessica Travierso Austin Peay State University Research Alliance in Math and Science National Center for Computational Sciences,
PatternHunter: faster and more sensitive homology search
Gaurab KCa,b, Zachary Mitchella,c and Sarat Sreepathia
Presentation transcript:

Cluster Computing Applications Project: Parallelizing BLAST The field of Bioinformatics needs faster string matching algorithms. What Exactly is BLAST?  BLAST (Basic Local Alignment Search Tool) is a heuristic algorithm that uses a technique of finding efficient matches between query strings and target database of strings. Abstract Parallelizing the BLAST Algorithm: Feasible or Not? The field of Bioinformatics Research, especially in the field of coding and classifying genes, has a need for fast string matching algorithms. At Oak Ridge National Laboratory (ORNL), in the Mathematics and Computer Science Division, High Performance Cluster (HPC) computing has been applied to many different areas, from Computational Biology to Computational Material Science. The purpose of this project is to do a study on the Basic Local Alignment Search Tool (BLAST) algorithm: define the structure of the BLAST algorithm, state why the algorithm is valuable as a Bioinformatics database tool and explore the ways of increasing this algorithm's effectiveness and speed. BLAST stands for Basic Local Alignment Search Tool and it is used in Bioinformatics to find alignments between strings. BLAST is a heuristic algorithm that uses the technique of finding matches between fragments of a query string and a target database. This eliminates much of the data in a database without running a full comparison for each letter in the search string. Once query and database string alignments are found (if the fragments match within a certain threshold), the full strings are matched. Several methods of parallelizing BLAST have been explored and this information will be summarized in this paper. This paper will conclude with a number of potential methods for increasing the speed and effectiveness of BLAST. Abstract Parallelizing the BLAST Algorithm: Feasible or Not? The field of Bioinformatics Research, especially in the field of coding and classifying genes, has a need for fast string matching algorithms. At Oak Ridge National Laboratory (ORNL), in the Mathematics and Computer Science Division, High Performance Cluster (HPC) computing has been applied to many different areas, from Computational Biology to Computational Material Science. The purpose of this project is to do a study on the Basic Local Alignment Search Tool (BLAST) algorithm: define the structure of the BLAST algorithm, state why the algorithm is valuable as a Bioinformatics database tool and explore the ways of increasing this algorithm's effectiveness and speed. BLAST stands for Basic Local Alignment Search Tool and it is used in Bioinformatics to find alignments between strings. BLAST is a heuristic algorithm that uses the technique of finding matches between fragments of a query string and a target database. This eliminates much of the data in a database without running a full comparison for each letter in the search string. Once query and database string alignments are found (if the fragments match within a certain threshold), the full strings are matched. Several methods of parallelizing BLAST have been explored and this information will be summarized in this paper. This paper will conclude with a number of potential methods for increasing the speed and effectiveness of BLAST. This research was performed under the Research Alliance for Minorities Program administered through the Computer Science and Mathematics Division, Oak Ridge National Laboratory. This Program is sponsored by the Mathematical, Information, and Computational Sciences Division; Office of Advanced Scientific Computing Research; U.S. Department of Energy. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science, U.S. Department of Energy. This work has been authored by a contractor of the U.S. Government under contract DE-AC05-00OR Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. I would like to extend my thanks to Stephen L. Scott Ph.D., John Mugler, Thomas Naughton, and Brian Luethke for their invaluable mentoring, Michaelangelo Salcedo Ph.D. for his guidance, Debbie McCoy, and Cheryl Hamby for their support in the RAM program. This project began with learning cluster-computing infrastructure. My training included tools developed at Oak Ridge National Laboratory: the Open Source Cluster Application Resources (OSCAR) tool and Cluster Command and Control (C3) tool. OSCAR a robust and user-friendly application is used for installation of clusters. C3 a suite of cluster tools is used for administration of clusters. The question may be asked, once you have a cluster then what do you apply it to? My further research answers this question, which is the second half of the project. It pertains to investigating a Bioinformatics application called BLAST and exploring known parallelization schemas for cluster computing This project began with learning cluster-computing infrastructure. My training included tools developed at Oak Ridge National Laboratory: the Open Source Cluster Application Resources (OSCAR) tool and Cluster Command and Control (C3) tool. OSCAR a robust and user-friendly application is used for installation of clusters. C3 a suite of cluster tools is used for administration of clusters. The question may be asked, once you have a cluster then what do you apply it to? My further research answers this question, which is the second half of the project. It pertains to investigating a Bioinformatics application called BLAST and exploring known parallelization schemas for cluster computing Introduction Infrastructure Overview Red Hat Linux 7.2 OSCAR 1.3 –C3 - –LAM/MPI - –Maui Scheduler - –MPICH - –OpenSSH - –OpenSSL - –PBS - –PVM - –System Installation Suite (SIS) Applications Overview BLAST a Bioinformatics tool. BLAST Parallelize BLAST ’ s algorithm. BLAST William Burke York College, City University of New York Stephen L. Scott & John Mugler Oak Ridge National Laboratory Research Alliance of Minorities (RAM), Computer Science and Mathematics Division: Poster Session 2002 C C luster C C ommand & C C ontrol O O pen S S ource C C luster A A pplication R R esources eXtreme TORC TORC HighTORC