Tuning Tophat2 Belinda Giardine. Tophat2 Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules.

Slides:



Advertisements
Similar presentations
Automating Software Module Testing for FAA Certification Usha Santhanam The Boeing Company.
Advertisements

Optimizing single thread performance Dependence Loop transformations.
Computations have to be distributed !
Guidelines for working with Microsoft Visual Studio.Net.
Parallelizing Compilers Presented by Yiwei Zhang.
16/27/2015 3:38 AM6/27/2015 3:38 AM6/27/2015 3:38 AMTesting and Debugging Testing The process of verifying the software performs to the specifications.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
Introduction To C++ Programming 1.0 Basic C++ Program Structure 2.0 Program Control 3.0 Array And Structures 4.0 Function 5.0 Pointer 6.0 Secure Programming.
Discussion Section: HW1 and Programming Tips GS540.
Before we start: Align sequence reads to the reference genome
Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
8 Hour Heat Run Sequencer History of the test Analyze of the events Memory space used by the sequencer Questions in view of the future tests.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
General Programming Introduction to Computing Science and Programming I.
Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
1 Debugging. 2 A Lot of Time is Spent Debugging Programs Debugging. Cyclic process of editing, compiling, and fixing errors. n Always a logical explanation.
Pointers OVERVIEW.
Games Development 2 Concurrent Programming CO3301 Week 9.
Application Profiling Using gprof. What is profiling? Allows you to learn:  where your program is spending its time  what functions called what other.
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 12/6/2006 Lecture 24 – Profilers.
TA SURVEY Have the TA write their name on the board Fill out the survey at: The TA should walk out 5 minutes.
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Daniel Gagnon’s Final Project Number Guessing Game Widget By: Daniel Gagnon.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Review Binary Numbers Bit : 0 or 1 Byte: 8 bites 256 different values 2 8 KB : 1024 bytes 2 10 bytes MB : 1024 * 1024 bytes 2 10 * 2 10 (2 20 ) bytes GB.
Adv. UNIX: Profile/151 Advanced UNIX v Objectives –introduce profiling based on execution times and line counts Special Topics in Comp.
Lecture 3 Classes, Structs, Enums Passing by reference and value Arrays.
1 Flight Times. 2 Problem Specification 3 Additional Specifications You may assume that the input is a valid 24 hour time. Output the time entered by.
Research Topics in Computational Science. Agenda Survey Overview.
1 Program Planning and Design Important stages before actual program is written.
1 Announcements  Homework 4 out today  Dec 7 th is the last day you can turn in Lab 4 and HW4, so plan ahead.
Copyright ©: Nahrstedt, Angrave, Abdelzaher1 Processes and Threads.
Introduction to RNAseq
12/14/2015 Concept of Test Driven Development applied to Embedded Systems M. Smith University of Calgary, Canada 1 Automated Testing Environment Concepts.
Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes.
Iteration. Iteration: Review  If you wanted to display all the numbers from 1 to 1000, you wouldn’t want to do this, would you? Start display 1 display.
1 Performance Issues CIS*2450 Advanced Programming Concepts.
1 CS 1430: Programming in C++. 2 Input: Input ends with -1 Sentinel-Controlled Loop Input: Input begins with.
Improving Matlab Performance CS1114
CMSC 104, Version 8/061L14AssignmentOps.ppt Assignment Operators Topics Increment and Decrement Operators Assignment Operators Debugging Tips Reading Section.
Short Read Workshop Day 5: Mapping and Visualization
© Dr. A. Williams, Fall Present Software Quality Assurance – Clover Lab 1 Tutorial / lab 2: Code instrumentation Goals of this session: 1.Create.
LECTURE 22: BIG-OH COMPLEXITY CSC 212 – Data Structures.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Python is Awesome! (and cooler than R). My Research.
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Native Computing & Optimization on Xeon Phi John D. McCalpin, Ph.D. Texas Advanced Computing Center.
Loop Design What goes into coding a loop. Considerations for Loop Design ● There are basically two kinds of loops: ● Those that form some accumulated.
Canadian Bioinformatics Workshops
July 10, 2016ISA's, Compilers, and Assembly1 CS232 roadmap In the first 3 quarters of the class, we have covered 1.Understanding the relationship between.
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
NGS File formats Raw data from various vendors => various formats
Day 5 Mapping and Visualization
Introduction to Computing Science and Programming I
Cancer Genomics Core Lab
RNA Sequencing Day 7 Wooohoooo!
What to do when a test fails
GE3M25: Data Analysis, Class 4
Programming Tips GS540 January 10, 2011.
What is Bash Shell Scripting?
Multi-core CPU Computing Straightforward with OpenMP
Reference Parameters.
Parallel Computation Patterns (Reduction)
Genome 540: Discussion Section Week 3
Computational Pipeline Strategies
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Tuning Tophat2 Belinda Giardine

Tophat2 Aligns reads from RNA to the genome Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding, decoding, regulation, and expression of genes. Adds on dealing with gaps in the alignments by breaking the reads into small pieces ~20 bases and reassembling the reads after mapping. Though the new version is more parallel still slow (more than 4 days for recent runs) It uses Bowtie to do the actual mapping

RNA-seq image from wikipedia fastq file, a single 1:N:0:NAAGGC GAATGCCCCCGGCCGTCCCTCTTAATCATGGCCTCAGTTCCGAAAACCANCAAAATAGAACCGCGGTCCTA TTNN +

Tophat2 Pipeline written in C++ (34,351 lines of code in 63 files) Wrapper written in Python 3 of the programs use Boost pthreads long_spanning_reads.cpp segment_juncs.cpp tophat_reports.cpp Programs are compiled as one unit under autoconfig and automake, communication between programs with temporary files. Many prerequisites: zlib, Boost, samtools, Bowtie, this and the amount of file IO makes running on MIC only not feasible.

Data files Reads in fastq format, 20–200 million reads (2 x 20gb for my test) Reference sequence and indexes used for mapping 6gb for mouse Final output 14gb for my test

Work from last time Compiling start with gcc then icc then add –mmic (this failed in trying to get all the prerequisites) Test run on host, using Tophat’s log of run for time. Run on biostar(Xeon) using 8 threads took 26 hours Run on stampede (host) using 16 threads took 19 hours, 40 mins Run on stampede (host) using 32 threads took 24 hours

New work Python wrapper and long run times makes gprof and vtune difficult to profile code with. Going from my experience in Biostar, I am starting with segment_juncs executable. Keeping the temporary files that are used for passing data between programs, I ran just segment_juncs. Time for segment_junctions run alone: 8 threads 2 hours 13 minutes 16 threads 1 hour 15 minutes (2 ½ out of 19 ½ hours total) of this 76% is spent in the parallel section 32 threads 2 hours 12 minutes

Failed attempts Run vtune on segment_juncs times out of full data license errors Check loops in par_report that are assumed dependencies. lines of code indicated not loops or in loops? contradictory lines Offloading threaded section of code in segment_juncs.cpp. Will it actually improve speed or too much file IO? Lots of variables to copy File IO

Hardison Lab

vec_report3 segment_juncs.cpp(135): (col. 32) remark: loop was not vectorized: existence of vector dependence. segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed ANTI dependence between r line 135 and r line 135. segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed FLOW dependence between r line 135 and r line 135. Line 135: left_seg.left = max(0, T.right() - 2);

opt_report REMOVED VAR left_mismatches _V$78b REMOVED PACK left_mismatches REMOVED VAR right_mismatches _V$78d REMOVED PACK right_mismatches

gprof output for segment_juncs Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ts/call Ts/call name extend_from_seeds(std::vector >&, PackedSplice const&, std::vector >, std::allocator > > > const&, std::string const&, std::string const&, unsigned long, unsigned long, int) pack_splice(std::string const&, int, int, unsigned int) __do_global_dtors_aux pack_right_splice_half(std::string const&, unsigned int, unsigned int)

Parallel section of code: vector threads; for (int i = 0; i < num_threads; ++i) { SegmentSearchWorker worker; worker.rt = &rt; worker.reads_fname = left_reads_fname; worker.segmap_fnames = &left_segmap_fnames; worker.partner_reads_map_fname = right_reads_map_fname; worker.seg_partner_reads_map_fname = right_seg_fname_for_segment_search; worker.juncs = &vseg_juncs[i]; worker.deletions = &vdeletions[i]; worker.insertions = &vinsertions[i]; worker.fusions = &vfusions[i]; worker.read = READ_LEFT; worker.partner_hit_offset = 0; worker.seg_partner_hit_offset = 0; if (i == 0) { worker.begin_id = 0; worker.seg_offsets = vector (left_segmap_fnames.size(), 0); worker.read_offset = 0; } else { worker.begin_id = read_ids[i-1]; worker.seg_offsets.insert(worker.seg_offsets.end(), offsets[i-1].begin()+1, offsets[i-1].end()); worker.read_offset = offsets[i-1][0]; if (partner_offsets.size() > 0) worker.partner_hit_offset = partner_offsets[i-1]; if (seg_partner_offsets.size() > 0) worker.seg_partner_hit_offset = seg_partner_offsets[i-1]; } worker.end_id = (i+1 ::max(); //Geo debug: //fprintf(stderr, "Worker %d: begin_id=%lu, end_id=%lu\n", i, worker.begin_id, worker.end_id); if (num_threads > 1 && i + 1 < num_threads) threads.push_back(new boost::thread(worker)); else worker(); }