Performance Profiling of NGS Genome Assembly Algorithms Alex Ropelewski Pittsburgh Supercomputing Center 412-268-4960.

Slides:



Advertisements
Similar presentations
PDSA Classroom.
Advertisements

Hierarchical Cluster Structures and Symmetries in Genomic Sequences Andrei Zinovyev Institut des Hautes Études Scientifiques group of M.Gromov.
CS 336 March 19, 2012 Tandy Warnow.
Figure 1 The oligonucleotide sequence containing the T-bulge which was investigated by Natrajan, et al. The thymidine of the T-bulge is denoted in bold.
Memory Address Decoding
Graph Algorithms in Bioinformatics. Outline Introduction to Graph Theory Eulerian & Hamiltonian Cycle Problems Benzer Experiment and Interval Graphs DNA.
Introduction to Bioinformatics Algorithms Graph Algorithms in Bioinformatics.
Seven clusters and four types of symmetry in microbial genomes Andrei Zinovyev Bioinformatics service group of M.Gromov Tatyana Popova R&D Centre.
Proprietary Signal Generation and Imaging Photons Generated Reagent Flow PicoTiterPlate Wells Sequencing By Synthesis 1600K field of addressable wells.
Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Parallel Architectures: Topologies Heiko Schröder, 2003.
CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly Spring 2015.
"An Eulerian path approach to global multiple alignment for DNA sequences” by Y. Zhang and M. Waterman * “An Eulerian path approach to local multiple alignment.
Sequencing tutorial Peter HANTZ EMBL Heidelberg.
CS 6030 – Bioinformatics Summer II 2012 Jason Eric Johnson
Introduction to Bioinformatics Algorithms Graph Algorithms in Bioinformatics.
Genome Reconstruction: A Puzzle With a Billion Pieces Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau & Pavel Pevzner University.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Sequence Assembly: Concepts BMI/CS 576 Sushmita Roy September 2012 BMI/CS 576.
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 8, 2005 ChengXiang Zhai Department of Computer Science University of Illinois,
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Graphs and DNA sequencing CS 466 Saurabh Sinha. Three problems in graph theory.
Universidad de los Andes, Bogotá, Colombia, Septiembre 2015  Sequence and annotation of genomes and metagenomes with Galaxy Dr. rer. nat. Diego Mauricio.
Lecture 3 1.Protein Function prediction using network concepts 2.Application of network concepts in DNA sequencing.
Bioinformatics Algorithms Department of Computer Science and Engineering BUET Maximum Likelihood Genome Assembly Paul Medvedev Michael Brudno Presented.
394C March 5, 2012 Introduction to Genome Assembly.
Graph Theory And Bioinformatics Jason Wengert. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing.
Sequence Assembly Fall 2015 BMI/CS 576 Colin Dewey
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Fuzzypath – Algorithms, Applications and Future Developments
Metagenomics Assembly Hubert DENISE
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Introduction to Bioinformatics Algorithms Graph Algorithms in Bioinformatics.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Prodigiosin Production in E. Coli Brian Hovey and Stephanie Vondrak.
Hashing Algorithm and its Applications in Bioinformatics By Zemin Ning Informatics Division The Wellcome Trust Sanger Institute.
A new Approach to Fragment Assembly in DNA Sequenceing Fei wu April,24,2006.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Sequence Alignment and Genome Assembly Zemin Ning The Wellcome Trust Sanger Institute.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Graph Algorithms © Jones and Pevzner © Robert Simons
CSCI2950-C Lecture 2 DNA Sequencing and Fragment Assembly
Short reads: 50 to 150 nt (nucleotide)
DNA Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics)
CSCI2950-C Genomes, Networks, and Cancer
CSCI2950-C Lecture 3 DNA Sequencing and Fragment Assembly
Modelling Proteomes.
Assembly.
Eulerian tours Miles Jones MTThF 8:30-9:50am CSE 4140 August 15, 2016.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Lecture 3 Protein Function prediction using network concepts
CSE 5290: Algorithms for Bioinformatics Fall 2011
Huntington Disease (HD)
DNA By: Mr. Kauffman.
Graph Algorithms in Bioinformatics
CS 598AGB Genome Assembly Tandy Warnow.
Molecular engineering of photoresponsive three-dimensional DNA
Graph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics
CSE 5290: Algorithms for Bioinformatics Fall 2009
Your name here Your institution here
Implementation of a De-blocking Filter and Optimization in PLX
Presentation transcript:

Performance Profiling of NGS Genome Assembly Algorithms Alex Ropelewski Pittsburgh Supercomputing Center

NGS: Assembly Algorithm ALIGNED 3-MERS 1.ATG 2. TGG 3. GGC 4. GCG 5. CGT 6. GTG 7. TGC 8. GCA 9. CAA 10. AAT Genome: ATGGCGTGCAAT ATTG AA CA GTCG GC GG 10.AAT 1.ATG 2.TGG 3.GGC 6.GTG 5.CGT 4.GCG 9.CAA 8.GCA 7.TGC Assembled Genome via Eulerian Cycle (reads represented as edges) de Bruijn Graph

Program characteristics 2 codes of interest: – Allpaths-LG: designed for assembling large genomes (Mostly C++, pipeline uses make) – Velvet: used frequently for small genomes (written in C; uses some OpenMP) Both codes are: – memory intensive – time intensive – have some parallelization

Desired Profile Information For each program/step in the assembly pipeline: – Time and Memory consumption – Identification of serial and parallel steps – Quantify I/O characteristics – Quantify how many times each step is run For the most time consuming and most called programs/steps: – Time consumed by each function – How many times is each function called – Quantify I/O characteristics – Identify parallel steps and examine scaling – Describe the main memory consumers

General Outcome Where should the optimization effort be focused? – Are there serial optimizations? – Additional candidates for parallelization? – Can the existing parallelization be improved? – Can the IO be improved? – Memory performance issues to address? – Something else?