High Performance Computing on an IBM Cell Processor --- Bioinformatics

Slides:



Advertisements
Similar presentations
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Advertisements

1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 1: Introduction Roman Manevich Ben-Gurion University.
Ido Tov & Matan Raveh Parallel Processing ( ) January 2014 Electrical and Computer Engineering DPT. Ben-Gurion University.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
IT Project Management Cheng Li, Ph.D. August 2003.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Neuroblastoma Stroma Classification on the Sony Playstation 3 Tim Hartley, Olcay Sertel, Mansoor Khan, Umit Catalyurek, Joel Saltz, Metin Gurcan Department.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Group May Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Group May Bryan McCoy Kinit Patel Tyson Williams.
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
High Performance Computing on an IBM Cell Processor Team May08-24: Kyle Byerly Matt Rohlf Bryan Venteicher Shannon McCormick Faculty Adviser: Team Website:
High Performance Computing on an IBM Cell Processor Bioinformatics Team Members Kyle Byerly Shannon McCormick Matt Rohlf Bryan Venteicher Advisor Dr. Zhao.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
The Octoplier: A New Software Device Affecting Hardware Group 4 Austin Beam Brittany Dearien Brittany Dearien Warren Irwin Amanda Medlin Amanda Medlin.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
May08-21 Model-Based Software Development Kevin Korslund Daniel De Graaf Cory Kleinheksel Benjamin Miller Client – Rockwell Collins Faculty Advisor – Dr.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad*
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
Clients/Faculty Advisors Dr. Eric Bartlett May01-14 Team Members David Herrick Brian Kerhin Chris Kirk Ayush Sharma Incremental Learning With Neural Networks.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
SBS Alert Web Console Senior Design 3 – February 28, 2005 Debra Sweet Barrett.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Introduction to Systems Analysis and Design
DDC 2223 SYSTEM SOFTWARE DDC2223 SYSTEM SOFTWARE.
High performance bioinformatics
IM.Grid: A Grid Computing Solution for image processing
Ioannis E. Venetis Department of Computer Engineering and Informatics
Welcome: Intel Multicore Research Conference
Cross Platform Development using Software Matrix
Cell Architecture.
Introduction to Computers
Genomic Data Clustering on FPGAs for Compression
CS2100 Computer Organisation
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Objective of This Course
Efficient software checkpointing framework for speculative techniques
Alternative Processor Panel Results 2008
Operating System Introduction.
Multicore and GPU Programming
Wellington Cabrera Advisor: Carlos Ordonez
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Performance and Code Tuning Overview
Multicore and GPU Programming
CS2100 Computer Organisation
Presentation transcript:

High Performance Computing on an IBM Cell Processor --- Bioinformatics May08-24 Advisor: Dr. Zhao Zhang Kyle Byerly Shannon McCormick Matt Rohlf Bryan Venteicher shannon

Introduction Problem Statement Proposed Solution Researchers need to tackle more complex and computational demanding problems, but are trying to maximize performance within their budget Proposed Solution Use the PlayStation 3 (PS3) and Cell Broadband Engine (Cell/B.E.) to achieve improved performance of bioinformatics applications at a low cost shannon Senior Design May08-24 4/30/08

Requirements Functional Requirements Non-functional requirements Application ported shall run on the Cell/B.E. Ported application shall return the same results as the original application Ported application shall return its running time for comparison to original application Non-functional requirements The ported application shall run faster on the PS3 The user interface will not be altered shannon Senior Design May08-24 4/30/08

Cell Processor Sony, Toshiba, and IBM Work began in 2000 February 2005 – First technical disclosures May 2005 – First public demonstration “Super-computer on a chip” Multi-core processor Home entertainment to distributed computing Heterogeneous Processor Power Processor Element (PPE) Synergistic Processing Element (SPE) Element Interconnect Bus (EIB) shannon www.power.org/resources/devcorner/cellcorner/CellTraining_Track, L1T1H1-02 Cell Overview Senior Design May08-24 4/30/08

Ported Program Selection Criteria: Manageable size given our timeframe Suitable documentation of the algorithm exists Application suitable to be parallelized Not previously ported to the Cell/B.E. ClustalW and DNAPenny both met the first three criteria ClustalW had already been ported DNAPenny was selected as the main focus matt Senior Design May08-24 4/30/08

ClustalW Prototype Created un-optimized port of ClustalW Started with an already parallelized version: clustalw_smp Concentrated on making working, correct port Not interested in performance Working version completed Useful in gauging work involved in porting DNAPenny matt Senior Design May08-24 4/30/08

DNAPenny Takes set of DNA sequences as input Returns set of parsimonious trees Represent the shortest evolutionary path between individual DNA sequences Team had identified hotspots in code matt Senior Design May08-24 4/30/08

General Parallel DNAPenny Profiling DNAPenny showed a single function was responsible for 90% of runtime Analyzed function to determine suitability for parallelizing Data is divided among threads bryan Senior Design May08-24 4/30/08

Cell/B.E. Port of DNAPenny Done in several iterations Load and execute code on a single SPE Load code once on SPE, execute for duration of program Load code once on multiple SPEs Use compiler optimizations Hand vectorized SPE code bryan Senior Design May08-24 4/30/08

Test Hardware PlayStation 3 Powerful server Cell/B.E., 256MB RAM Quad-core Intel Xeon 3.0GHz, 6GB RAM kyle Senior Design May08-24 4/30/08

Testing Methodology Input files Two phases Six were selected Execute, verify, and benchmark Aggregation and graphing of data kyle Senior Design May08-24 4/30/08

Benchmark Results infile.orig Code revision 4-Way 3.0GHz Machine (seconds) X Speedup PlayStation 3 (seconds) dnapenny_orig 823.568 1 7793.915 dnapenny_slimmer 360.131 2.28685673 941.981 8.273962 parallel_dnapenny_1.0 221.432 3.71928177 780.867 9.9811043 supplement_spe_parallel_1SPE N/A 1111.471 7.0122522 supplement_spe_parallel_3SPE 443.521 17.572821 supplement_spe_parallel_6SPE 277.233 28.11323 supplement_parallel_vector_1SPE 260.952 29.867236 supplement_parallel_vector_3SPE 153.656 50.723141 supplement_parallel_vector_6SPE 130.59 59.682326 kyle Senior Design May08-24 4/30/08

Benchmarking Results (cont) kyle Senior Design May08-24 4/30/08

Earned Value Analysis Task Estimated Hours Actual Hours % Complete Budgeted Costs of Work Scheduled Problem Definition 100 100.5 100% $1,000.00 Technology and Implementation Considerations 36 37 $360.00 End-Product Design 20 17.5 $200.00 End-Product Prototype Implementation 320 272 $3,200.00 End-Product Testing 60 78.5 $600.00 End-Product Documentation 40 42 $400.00 End-Product Demonstration 48 35 $480.00 Project Reporting 140 99 $1,400.00 Total 764 681.5 $7,640.00 matt Senior Design May08-24 4/30/08

Earned Value Analysis (cont.) Task Budgeted Costs of Work Performed Actual Costs of Work Performed Cost Variance Cost Performance Index Problem Definition $1,000.00 $1,005.00 -$5.00 99.5% Technology and Implementation Considerations $360.00 $370.00 -$10.00 97.3% End-Product Design $200.00 $175.00 $25.00 114.3% End-Product Prototype Implementation $3,200.00 $2,720.00 $480.00 117.6% End-Product Testing $600.00 $785.00 -$185.00 76.4% End-Product Documentation $400.00 $420.00 -$20.00 95.2% End-Product Demonstration $350.00 $130.00 137.1% Project Reporting $1,400.00 $990.00 $410.00 141.4% Total $7,640.00 $6,815.00 $825.00 112.1% matt Senior Design May08-24 4/30/08

Lessons Learned Cell/B.E. is a unique programming challenge Many tools available to help understand poorly documented code bryan Senior Design May08-24 4/30/08

Conclusion Significant speedup achieved Surprised at the impact of hand vectorization Cell/B.E. is well suited for this type of application shannon Senior Design May08-24 4/30/08

Questions Everyone!111 Senior Design May08-24 4/30/08