BioPerf: An Open Benchmark Suite for Evaluating Computer Architecture on Bioinformatics and Life Science Applications David A. Bader.

Slides:



Advertisements
Similar presentations
Andrew Meade School of Biological Sciences.
Advertisements

IITB - Bioinformatics Workshop Indexing Genome Sequences Srikanta B. J. Database Systems Lab (DSL) Indian Institute of Science.
Bioinformatics (4) Sequence Analysis. figure NA1: Common & simple DNA2: the last 5000 generations Sequence Similarity and Homology.
1 PBB: A Parallel Bioinformatics Benchmark Suite for Shared Memory Multiprocessors CHEN Wenguang HPC Inst., CS Dept., Tsinghua University.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
Bioinformatics and Phylogenetic Analysis
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Novel computational methods for large scale genome comparison PhD Director: Dr. Xavier Messeguer Departament de Llenguatges i Sistemes Informàtics Universitat.
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Multiple Sequence Alignment
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BioPerf: An Open Benchmark Suite for Evaluating Computer Architecture on Bioinformatics and Life Science Applications David A. Bader.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
K Phone: Web: A Software Package for the Design and Analysis of Microbial Functional.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Algorithms research Tandy Warnow UT-Austin. “Algorithms group” UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
High performance bioinformatics
New Approaches for Inferring the Tree of Life
Genomic Data Clustering on FPGAs for Compression
A Hybrid Algorithm for Multiple DNA Sequence Alignment
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Basic Local Alignment Search Tool (BLAST)
MULTIPLE SEQUENCE ALIGNMENT
Presentation transcript:

BioPerf: An Open Benchmark Suite for Evaluating Computer Architecture on Bioinformatics and Life Science Applications David A. Bader

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Collaborators Vipin Sachdeva (U New Mexico, Georgia Tech, IBM Austin) Tao Li (U Florida) Yue Li (U Florida) Virat Agrawal (IIT Delhi) Gaurav Goel (IIT Delhi) Abhishek Narain Singh (IIT Delhi) Ram Rajamony (IBM Austin)

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Acknowledgment of Support National Science Foundation –CAREER: High-Performance Algorithms for Scientific Applications ( ; ) –ITR: Building the Tree of Life -- A National Resource for Phyloinformatics and Computational Phylogenetics (EF/BIO ) –DEB: Ecosystem Studies: Self-Organization of Semi-Arid Landscapes: Test of Optimality Principles ( ) –ITR/AP: Reconstructing Complex Evolutionary Histories ( ) –DEB Comparative Chloroplast Genomics: Integrating Computational Methods, Molecular Evolution, and Phylogeny ( ) –ITR/AP(DEB): Computing Optimal Phylogenetic Trees under Genome Rearrangement Metrics ( ) –DBI: Acquisition of a High Performance Shared-Memory Computer for Computational Science and Engineering ( ). IBM PERCS / DARPA High Productivity Computing Systems (HPCS) –DARPA Contract NBCH

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Contributions of this Work An open source, freely-available, freely- redistributable suite of applications and inputs, BioPerf, which spans a wide variety of bioinformatics application – Performance study on PowerPC G5, IBM Mambo simulator, and Alpha

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Outline Motivation Bioinformatics Workload BioPerf Suite Performance Analysis on PowerPC G5 and Mambo Conclusions and Future Work

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Motivation Improve performance on a wide range of bioinformatics applications –Heterogeneous in problems, algorithms, applications BioPerf workload assembled as a representative set of bioinformatics applications important now and expected to increase in usage over the next 510 years Decide if this is YAW yet another workload or rather unique in its characteristics

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Related Work General benchmark suites: SPEC Domain-specific benchmarks –TPC, EEMBC, SPLASH, SPLASH-2 Few special benchmark for bioinformatics Previous attempts have been incomplete: Analysis on old architectures (BioBench) [Albayraktaroglu et al., ISPASS 2005] Included proprietary codes in benchmark suite (BioInfoMark) [Li et al., MASCOTS 2005] Previous suites not available for download Included several non-redistributable packages Inputs not articulated and not included with benchmark suite for similar comparisons

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Guiding Principles for BioPerf Coverage: The packages must span the heterogeneity of algorithms and biological and life science problems important today as well as (in our view) increasing in importance over the next 5-10 years. Popularity: Codes with larger numbers of users are preferred because these packages represent a greater percentage of the aggregate workloads used in this domain. Open Source: Open source code allows the scientific study of the applicatio performance, the ability to place hooks into the code, and eases porting to new architectures. Licensing: Only packages for which their licensing allows free redistribution as open source are included. This requirement eliminated several popular packages, but was kept as a strict requirement to encourage the broadest use of this suite. Portability: Preference was given to packages that used standard programming languages and could easily be ported to new systems (both in sequential and parallel languages). Performance: We gave slight preference to packages whose performance is well- characterized in other studies. In addition, we strived for computationally- demanding packages and included parallel versions where available.

BioPerf: an open bioinformatics and life sciences workload, David A. Bader BioPerf Suite Pre-compiled binaries (PowerPC, x86, Alpha) Scalable Input datasets with each code for fair comparisons Scripts for installation, running and collecting outputs Documentation for compiling and using the suite Parallel codes where available Available for download from

BioPerf: an open bioinformatics and life sciences workload, David A. Bader BioPerf workload AreaPackageExecutables Sequence homology Word-based BLASTblastp, blastn Profile-based HMMERhmmpfam, hmmsearch Sequence Alignment Pairwise FASTAssearch, fasta Multiple CLUSTALWclustalw, clustalw_smp Multiple TCOFFEEtcoffee Phylogeny Parsimony/Likelihood PHYLIPdnapenny, promlk Gene Rearrangement GRAPPAgrappa Protein Structure Prediction PREDATORpredator Gene Finding GLIMMERglimmer,glimmer-package Molecular Dynamics CEce

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Sequence Alignment Sequence Alignment one of the most useful techniques in computational biology –Sequence Alignment : Stacking the sequences against each other, with gaps if necessary, to expose similarity. ALIGNMENT S1 : ACGCTGATATTA ACGCTGATAT---TA S2 : AGTGTTATCCCTA AG--TGTTATCCCTA S1 : ACGCTGATATTA ACGCTGATAT---TA S2 : AGTGTTATCCCTA AG--TGTTATCCCTA MATCH

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Sequence Alignment Sequence Alignment one of the most useful techniques in computational biology –Sequence Alignment : Stacking the sequences against each other, with gaps if necessary, to expose similarity. ALIGNMENT S1 : ACGCTGATATTA ACGCTGATAT---TA S2 : AGTGTTATCCCTA AG--TGTTATCCCTA S1 : ACGCTGATATTA ACGCTGATAT---TA S2 : AGTGTTATCCCTA AG--TGTTATCCCTA MISMATCH

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Sequence Alignment Sequence Alignment one of the most useful techniques in computational biology –Sequence Alignment : Stacking the sequences against each other, with gaps if necessary, to expose similarity. ALIGNMENT S1 : ACGCTGATATTA ACGCTGATAT---TA S2 : AGTGTTATCCCTA AG--TGTTATCCCTA S1 : ACGCTGATATTA ACGCTGATAT---TA S2 : AGTGTTATCCCTA AG--TGTTATCCCTA GAPS

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Multiple Sequence Alignment Bring the greatest number of similar characters into same column. Provides much more information than pairwise alignment VSNS S N A A S VSN S S NA AS Run-time of dynamic programming solution = O(2 k n k ) 6 sequences of length X10 13 calculations Hence heuristics employed

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Sequence Homology Find similar sequences (DNA/protein) to an unknown sequence (DNA/protein). Computationally expensive Size of data is huge and grows exponentially every year Public databases available: Genbank, SwissProt, PDB NCBI GenbankDNA sequences5 million sequences SwissprotProtein Sequences160,000 sequences PDBProtein Structure32,000 structures Problems with computational approach Exact alignment is O(l 2 ) dynamic programming solution Quicker but less accurate heuristics employed

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Blast Basic Local Alignment Search Tool Developed by NCBI The most important bioinformatics application for its popularity Blast blastp blastn The homo sapiens hereditary haemochromatosis protein Non-redundant protein sequence nr developed by NCBI

BioPerf: an open bioinformatics and life sciences workload, David A. Bader FASTA Also performs pairwise sequence alignment FASTA Fasta34 ssearch The human LDL receptor precursor nr

BioPerf: an open bioinformatics and life sciences workload, David A. Bader ClustalW Multiple sequence alignment (MSA) program ClustalW Clustalw Clustalw_smp 317 Ureaplasma s gene sequences from NCBI Bacteria genomes database

BioPerf: an open bioinformatics and life sciences workload, David A. Bader T-Coffee A sequential MSA similar to ClustalW with higher accuracy and complexity T-coffeeTcoffee 50 sequences of average length 850 extracted from the Prefab database

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Hmmer Align multiple sequences by using hidden Markov models Hmmer hmmsearch hmmpfam Brine shrimp globin HMM of 50 aligned globin sequences

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Phylogenetic Reconstruction Study the evolution of all sequences and all species Find the best among all possible trees. Given n taxa, number of possible trees (2n-3)!! 10 taxa 2 million trees Approaches like maximum parsimony, maximum likelihood, among others The Tree of Life (10-100M organisms)

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Phylogeny Reconstruction: Phylip Collection of programs for inferring phylogenies Methods include –Maximum parsimony –Maximum likelihood –Distance based methods. Input: Aligned dataset of 92 cyclophilins proteins of eukaryotes each of length 220

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Phylogeny Reconstruction: GRAPPA Genome Rearrangements Analysis under Parsimony and other Phylogenetic Algorithm Freely-available, open-source, GNU GPL already used by other computational phylogeny groups, Caprara, Pevzner, LANL, FBI, Smithsonian Institute, Aventis, GlaxoSmithKline, PharmCos. Gene-order Phylogeny Reconstruction Breakpoint Median Inversion Median over one-billion fold speedup from previous codes Parallelism scales linearly with the number of processors [Bader, Moret, Warnow] Tobacco Campanulaceae Bob Jansen, UT-Austin; Linda Raubeson, Central Washington U Input: 12 bluebell flower species of 105 genes A B C D E F A B C D E F X Y Z W Gene-order based phylogeny

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Protein Structure Prediction Find the sequences, three dimensional structures and functions of all proteins and vice-versa –Why computationally? Experimental Techniques slow and expensive –Problems with computational approach Little understanding of how structure develops Does function really follow structure ?

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Protein Structure : Predator Tool for finding protein structures. Relies on local alignments from BLAST, FASTA Input: 20 sequences from Swissprot each of length about 7000 residues.

BioPerf: an open bioinformatics and life sciences workload, David A. Bader CE (Combinatorial Extension) Find structural similarities between the primary structures of pairs of proteins. CEce Two different types of hemoglobin which is used to transport oxygen

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Gene-Finding: Glimmer Gene-Finding: Find regions of genome which code for proteins. Widely used gene finding tool for microbial DNA. Input: Bacteria genome consisting of 9.2 million base pairs

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Pre-compiled binaries PowerPC x86 Alpha

BioPerf: an open bioinformatics and life sciences workload, David A. Bader BioPerf Performance Studies Analysis at the instruction and memory level on PowerPC Livegraph data helps to visualize performance as it varies during phases of a run Identify bottlenecks of current processors and make inputs for better performance on future processors Ongoing work using Mambo simulator (IBM PERCS) Pre-compiled Alpha binaries for the majority of benchmarks for simulation In order to reduce the simulation time, we collect the simulation points for those benchmarks by using SimPoint

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Conclusions Bioinformatics is a rapidly evolving field of increasing importance to computing BioPerf is a first step to characterize bioinformatics workload: infrastructure to evaluate performance Performance data collected so far provides insight into the limitations of current architectures

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Related Publications D.A. Bader, V. Sachdeva, A. Trehan, V. Agarwal, G. Gupta, and A.N. Singh, BioSPLASH: A sample workload from bioinformatics and computational biology for optimizing next-generation high-performance computer systems, (Poster Session), 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, MI, June 25-29, th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005) D.A. Bader, V. Sachdeva, BioSPLASH: Incorporating life sciences applications in the architectural optimizations of next-generation petaflop-system,(Poster Session), The 4th IEEE Computational Systems Bioinformatics Conference (CSB 2005), Stanford University, CA, August 8-11, 2005The 4th IEEE Computational Systems Bioinformatics Conference (CSB 2005) D.A. Bader, Y. Li, T. Li, V. Sachdeva, BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture on Bioinformatics Applications, The IEEE International Symposium on Workload Characterization (IISWC 2005), Austin, TX, October 6-8, 2005The IEEE International Symposium on Workload Characterization (IISWC 2005)

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Backup Slides

BioPerf: an open bioinformatics and life sciences workload, David A. Bader BioPerf on PowerPC PowerPC G5 dual-processor machine –Uniprocessor performance ( nvram boot-args=1 ) –CPU frequency of 1.8 Ghz –1 GB of physical memory available. Codes compiled using gcc-3.3 with no additional optimizations. MOnster tool of C.H.U.D package used for collecting hardware performance counters –Instruction and Memory level analysis

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw Algorithm Summary Pairwise alignment of all sequences against one another. –dynamic programming step Generate guide tree for aligning sequences –Sequences with highest similarity get aligned first Sequence-group and group-group alignments (progressive) –All possible pairwise alignments between sequence and group are tried. Highest scoring pair is how it gets aligned to the group. –All possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned –Clustalw uses calculations from step 1 for this step

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw Livegraphs Pairwise alignment step (70.1%) ppc instructions lag the total instructions Progressive alignment step (29.8%) Almost all instructions are ppc Guide tree formation (<0.1%) of total time Input: 318 sequences each of length almost 1050

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw Livegraphs L1D hit rate almost 100% Instructios executed low L1D Hit Rate falls down Instructions executed increase remarkably

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw Livegraphs Branch mispredicts is high in dynamic programming Instruction count is low Branch mispredicts falls in progressive alignment Instruction count increases in progressive alignment Is performance directly related to branch mispredicts ?

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw livegraphs Almost all branch mispredicts caused due to condition register mispredict

BioPerf: an open bioinformatics and life sciences workload, David A. Bader But what about loads per instruction ? Loads per instruction is high in dynamic programming Instruction count is low Loads per instruction falls in progressive alignment Instruction count increases in progressive alignment

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw livegraphs - smaller inputs Smaller input - 44 sequences of length 583 Same performance characteristics but with longer progressive alignment step

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw livegraphs – smaller inputs Same performance characteristics but with longer progressive alignment step

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw livegraphs – smaller inputs Almost all branch mispredicts caused due to condition register mispredict

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw livegraphs – smaller input Same performance characteristics but with longer progressive alignment step Can we use Mambo with smaller input sizes for more performance analysis ?

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Using Mambo with Clustalw and other applications Collect separate outputs for each phase of the run Inserted callthru exit into the source code separating each part Dump the system statistics at the end of each phase –mysim stats dump –mysim caches stats dump –MamboClearSystemStats (clean the previous statistics) Multiple mysim go in the.tcl file.

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Clustalw on Mambo Pairwise alignment – high loads and arithmetic instructions Progressive alignment uses results from first step – high branch and loads Mambo offers far more detailed instruction profiling than G5 ?

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Comparing large datasets with small datasets Is it feasible to use smaller input datasets for accurate simulation results ? Branch mispredicts lesser due to smaller dynamic programming step Branch mispredicts much higher High increase in L1d hit rate

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Summary of BioPerf performance Highest instructions executed per cycle High loads per instruction Low branch mispredicts Low TLB misses High L1d Hit rate Highest branch mispredicts and TLB misses High % of ld/st/io instruction Very low % of ld/st/io

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Summary of BioPerf performance High loads per instruction High branch mispredicts Mid-range instructions per cycle Low TLB misses Low % of ld/st/io instructions

BioPerf: an open bioinformatics and life sciences workload, David A. Bader Summary of BioPerf Performance Lowest instruction rate Lowest loads per instruction Low branch mispredicts and TLB misses Lowest L1D and L2D hit rate Low % of ld/st/io instructions