Sequence Alignment technology Chengwei Lei Fang Yuan Saleh Tamim.

Slides:



Advertisements
Similar presentations
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Copyright © 2004 Synamatix sdn bhd ( U) Applications of a Novel Structured Pattern Database Technology for Analysis of Data from Second Generation.
Final presentation Final presentation Tandem Cyclic Alignment.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
GNUMap: Unbiased Probabilistic Mapping of Next- Generation Sequencing Reads Nathan Clement Computational Sciences Laboratory Brigham Young University Provo,
Next Generation Sequencing, Assembly, and Alignment Methods
Seeds for Similarity Search Presentation by: Anastasia Fedynak.
Structural bioinformatics
GNANA SUNDAR RAJENDIRAN JOYESH MISHRA RISHI MISHRA FALL 2008 BIOINFORMATICS Clustering Method for Repeat Analysis in DNA sequences.
Bioinformatics Stephen Voglewede. What is Bioinformatics Computers have changed a lot of fields – including biology.
Mapping Genomes onto each other – Synteny detection CS 374 Aswath Manohar.
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
Designing Multiple Simultaneous Seeds for DNA Similarity Search Yanni Sun, Jeremy Buhler Washington University in Saint Louis.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Index-based search of single sequences Omkar Mate CS 374 Stanford University.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence comparison: Local alignment
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Mouse Genome Sequencing
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Massive Parallel Sequencing
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
1 Data structure:Lookup Table Application:BLAST. 2 The Look-up Table Data Structure A k-mer is a string of length k. A lookup table is a table of size.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
PatternHunter II: Highly Sensitive and Fast Homology Search Bioinformatics and Computational Molecular Biology (Fall 2005): Representation R 林語君.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Next Generation Sequencing
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Parallel Algorithm for Multiple Genome Alignment Using Multiple Clusters Nova Ahmed, Yi Pan, Art Vandenberg Georgia State University SURA Cyberinfrastructure.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Doug Raiford Phage class: introduction to sequence databases.
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
SSAHA: A Fast Search Method For Large DNA Databases Zemin Ning, Anthony J. Cox and James C. Mullikin Seminar by: Gerry Kammerer © ETH Zürich.
INTRODUCTION TO BIOINFORMATICS
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
How to use a bioinformatics website!
De novo Motif Finding using ChIP-Seq
Local alignment and BLAST
BNFO 236 Smith Waterman alignment
Discovery tools for human genetic variations
Pairwise Sequence Alignment
Introduction to Sequencing
Global vs Local Alignment
Assembly of Solexa tomato reads
PatternHunter: faster and more sensitive homology search
Basic Local Alignment Search Tool
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Sequence Alignment technology Chengwei Lei Fang Yuan Saleh Tamim

Goal Save time “PASS: a Program to Align Short Sequences Davide Campagna et al. Bioinformatics (2009)” Save money “Optimal pooling for genome re-sequencing with ultra- high-throughput short-read technologies, Iman Hajirasouliha, Bioinformatics (2008) ”

Keywords in both paper Reference sequence: A long Genomic sequence. Short reads: Input short strings. e.g. ATGCGTAC

Save time – PASS program PASS, a new algorithm to align short DNA sequences allowing gaps and mismatches. The performance of the program is very striking both for sensitivity and speed. For instance, gap alignment is achieved hundreds of times faster than BLAST and several times faster than SOAP, especially when gaps are allowed.

PASS Program to Align Short Sequences Performs gapped and ungapped alignment onto a reference sequence Seed words (11 and 12 bases) Short reads (7 and 8 bases) PST - calculated with the Needleman and Wunsch algorithm supplied with PASS Handles data generated by Solexa, SOLiD or 454 technologies

Approach/Algorithm

Analysis and Results Comparison of PASS with SOAP PASS has better sensitivity with words of 11 and runs at least 10 times faster

Save money - Optimal pooling method A set of experiments using the Solexa technology, based on bacterial artificial chromosome (BAC) clones, and address an experimental design problem. Basic idea: More than one BAC per lane in order to maximize the throughput of the Solexa technology, hence minimize its cost.

Input strings (short reads) Reference sequences Inputs

Normal pooling method One other hurdle in designing a globally optimal experiment is the rapid proliferation of number of possible configurations. For instance, if we would like to pool m=150 BACs into 15 groups of size=10, we would need to consider Infeasible to search all these configurations Optimal Pooling method

Input strings (short reads) Reference sequence Optimal Pooling method

Input strings (short reads) Reference sequences Optimal Pooling method

Input strings (short reads) Reference sequences Optimal Pooling method Pool

Problem How to separate the groups of short reads?

Input strings (short reads) Reference sequence Optimal Pooling method Pool

Two cases

Result

Conclusion Program for Save time “PASS: a Program to Align Short Sequences Davide Campagna et al. Bioinformatics (2009)” Algorithm for Save money “Optimal pooling for genome re-sequencing with ultra- high-throughput short-read technologies, Iman Hajirasouliha, Bioinformatics (2008) ”

Q & A