A Parallel Solution to Global Sequence Comparisons CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03.

Slides:



Advertisements
Similar presentations
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
BLAST Sequence alignment, E-value & Extreme value distribution.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
6/2/20151 Bioinformatics & Parallel Computing Jessica Chiang.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Heuristic alignment algorithms and cost matrices
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
1 Bio-Sequence Analysis with Cradle’s 3SoC™ Software Scalable System on Chip Xiandong Meng, Vipin Chaudhary Parallel and Distributed Computing Lab Wayne.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Algorithms Dr. Nancy Warter-Perez June 19, May 20, 2003 Developing Pairwise Sequence Alignment Algorithms2 Outline Programming workshop 2 solutions.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
A Study of Computational Methods for Storing and Sequencing Genetic Databases CSC 545 – Advanced Database Systems By: Nnamdi Ihuegbu 12/2/03.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Sequence alignment, E-value & Extreme value distribution
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
BLAST What it does and what it means Steven Slater Adapted from pt.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Sequence Alignment.
Doug Raiford Phage class: introduction to sequence databases.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
GA for Sequence Alignment  Pair-wise alignment  Multiple string alignment.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Sequence Alignment 11/24/2018.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Pairwise sequence Alignment.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment
CSE 589 Applied Algorithms Spring 1999
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence alignment BI420 – Introduction to Bioinformatics
Sequence alignment, E-value & Extreme value distribution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

A Parallel Solution to Global Sequence Comparisons CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03

Brief Introduction Human Genome Project (and others) -> Vast amount of biological data Human Genome Project (and others) -> Vast amount of biological data Venture: Computer Science and Biology (BCB) - > Genetic Databases (map,genomic,proteomic) Venture: Computer Science and Biology (BCB) - > Genetic Databases (map,genomic,proteomic) Expected date of Completed map of human genome: end of 2003 Expected date of Completed map of human genome: end of 2003 Next stage: Sequence comp. and Seq-Protein function. Next stage: Sequence comp. and Seq-Protein function. Useful to Pharm. Companies (CADD – e.g. SKB’s Relenza). Useful to Pharm. Companies (CADD – e.g. SKB’s Relenza).

Results - Sequence Current Sequence Generation Technologies Current Sequence Generation Technologies Maxam-Gilbert (use chemicals to cleave DNA at a specific base/length) Maxam-Gilbert (use chemicals to cleave DNA at a specific base/length) Sanger (use enzymatic procedures to produce DNA based on specific base—i.e. length) Sanger (use enzymatic procedures to produce DNA based on specific base—i.e. length)

Derivation of nucleotide sequence from human chromosome

Sequence Comparison Methods Types of Sequence Comparisons/alignmts. Types of Sequence Comparisons/alignmts. Global (“How similar are these two sequences?”) Global (“How similar are these two sequences?”) To find best overall alignment b/w two sequences To find best overall alignment b/w two sequences 1970: Needleman and Wunch (global, dynamic) 1970: Needleman and Wunch (global, dynamic) Shortcomings: in small similarities w/in 2 subseq. Shortcomings: in small similarities w/in 2 subseq. Local (“What sequences in a database are most similar to this sequence?”) Local (“What sequences in a database are most similar to this sequence?”) To find the best subseq. match b/w two sequences To find the best subseq. match b/w two sequences 1981: Smith and Waterman (local, dynamic) 1981: Smith and Waterman (local, dynamic) Shortcomings: not computationally efficient, slow Shortcomings: not computationally efficient, slow

Results - Sequence

Heuristic Search (Quick, Approximate) Heuristic Search (Quick, Approximate) Quickly search for “words” that match sequence. Then recursively perform local search on each matched word until no other matches Quickly search for “words” that match sequence. Then recursively perform local search on each matched word until no other matches FASTA (1998), BLAST(1990) FASTA (1998), BLAST(1990) Shortcomings: approximate not exact, E-Value (sig if <0.05) Shortcomings: approximate not exact, E-Value (sig if <0.05)

Results – Sequence (CSC Implementation) Sequence alignment can be represented as matrices and graphs (using rules and costs) Sequence alignment can be represented as matrices and graphs (using rules and costs) When converted into a directed acyclic graph, solution of the sequence alignment is the shortest-path with maximum value (max. path problem). When converted into a directed acyclic graph, solution of the sequence alignment is the shortest-path with maximum value (max. path problem).

Sequencing (CSC Implementation) Diag. edge = character matches; down edge = gap in string 2; across edge = gap in string 1 Can be solved dynamically as a ‘running max score’ (RMS). For each D(i,j), best RMS = max(west+gap1, north+gap2, NW+current_score) Replace D(i,j) with max Needleman-Wunch Dynamic Program

Parallel Solution Work (Slaves) allocated in stripes

Parallel Solution (Cont’d) ATT T33 G -3 [Ga p] -2-6 ATT T33 G -3 [Ga p] -2-6 Allocating Strips in SubMatrix

Parallel Results ATT T33 G -3 [Ga p] -2-6 Each cell in each strip computes maximum of NEIGHBORS (running max) ATT T G [Ga p] Path:T A G T -3 _ T

Improvements Parallel Smith-Waterman (localized; start and continue while >0 then end); (BLAZE- Stanford). Parallel Smith-Waterman (localized; start and continue while >0 then end); (BLAZE- Stanford). Pipeline implementation on an actual Mesh Topology Pipeline implementation on an actual Mesh Topology Other possible data infrastructures to traverse data in search of shortest path (e.g. Trees -- specialized) Other possible data infrastructures to traverse data in search of shortest path (e.g. Trees -- specialized)

Improvements (Cont’d) Faster means of comparing and aligning multiple sequences simultaneously (e.g. comparing novel protein sequence to family). Faster means of comparing and aligning multiple sequences simultaneously (e.g. comparing novel protein sequence to family).

Any Questions?