Download presentation
Presentation is loading. Please wait.
Published byViolet Ball Modified over 8 years ago
1
Performance Profiling of NGS Genome Assembly Algorithms Alex Ropelewski Pittsburgh Supercomputing Center ropelews@psc.edu 412-268-4960
2
NGS: Assembly Algorithm ALIGNED 3-MERS 1.ATG 2. TGG 3. GGC 4. GCG 5. CGT 6. GTG 7. TGC 8. GCA 9. CAA 10. AAT Genome: ATGGCGTGCAAT ATTG AA CA GTCG GC GG 10.AAT 1.ATG 2.TGG 3.GGC 6.GTG 5.CGT 4.GCG 9.CAA 8.GCA 7.TGC Assembled Genome via Eulerian Cycle (reads represented as edges) de Bruijn Graph
3
Program characteristics 2 codes of interest: – Allpaths-LG: designed for assembling large genomes (Mostly C++, pipeline uses make) – Velvet: used frequently for small genomes (written in C; uses some OpenMP) Both codes are: – memory intensive – time intensive – have some parallelization
4
Desired Profile Information For each program/step in the assembly pipeline: – Time and Memory consumption – Identification of serial and parallel steps – Quantify I/O characteristics – Quantify how many times each step is run For the most time consuming and most called programs/steps: – Time consumed by each function – How many times is each function called – Quantify I/O characteristics – Identify parallel steps and examine scaling – Describe the main memory consumers
5
General Outcome Where should the optimization effort be focused? – Are there serial optimizations? – Additional candidates for parallelization? – Can the existing parallelization be improved? – Can the IO be improved? – Memory performance issues to address? – Something else?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.