High Performance Computing on an IBM Cell Processor --- Bioinformatics May08-24 Advisor: Dr. Zhao Zhang Kyle Byerly Shannon McCormick Matt Rohlf Bryan Venteicher shannon
Introduction Problem Statement Proposed Solution Researchers need to tackle more complex and computational demanding problems, but are trying to maximize performance within their budget Proposed Solution Use the PlayStation 3 (PS3) and Cell Broadband Engine (Cell/B.E.) to achieve improved performance of bioinformatics applications at a low cost shannon Senior Design May08-24 4/30/08
Requirements Functional Requirements Non-functional requirements Application ported shall run on the Cell/B.E. Ported application shall return the same results as the original application Ported application shall return its running time for comparison to original application Non-functional requirements The ported application shall run faster on the PS3 The user interface will not be altered shannon Senior Design May08-24 4/30/08
Cell Processor Sony, Toshiba, and IBM Work began in 2000 February 2005 – First technical disclosures May 2005 – First public demonstration “Super-computer on a chip” Multi-core processor Home entertainment to distributed computing Heterogeneous Processor Power Processor Element (PPE) Synergistic Processing Element (SPE) Element Interconnect Bus (EIB) shannon www.power.org/resources/devcorner/cellcorner/CellTraining_Track, L1T1H1-02 Cell Overview Senior Design May08-24 4/30/08
Ported Program Selection Criteria: Manageable size given our timeframe Suitable documentation of the algorithm exists Application suitable to be parallelized Not previously ported to the Cell/B.E. ClustalW and DNAPenny both met the first three criteria ClustalW had already been ported DNAPenny was selected as the main focus matt Senior Design May08-24 4/30/08
ClustalW Prototype Created un-optimized port of ClustalW Started with an already parallelized version: clustalw_smp Concentrated on making working, correct port Not interested in performance Working version completed Useful in gauging work involved in porting DNAPenny matt Senior Design May08-24 4/30/08
DNAPenny Takes set of DNA sequences as input Returns set of parsimonious trees Represent the shortest evolutionary path between individual DNA sequences Team had identified hotspots in code matt Senior Design May08-24 4/30/08
General Parallel DNAPenny Profiling DNAPenny showed a single function was responsible for 90% of runtime Analyzed function to determine suitability for parallelizing Data is divided among threads bryan Senior Design May08-24 4/30/08
Cell/B.E. Port of DNAPenny Done in several iterations Load and execute code on a single SPE Load code once on SPE, execute for duration of program Load code once on multiple SPEs Use compiler optimizations Hand vectorized SPE code bryan Senior Design May08-24 4/30/08
Test Hardware PlayStation 3 Powerful server Cell/B.E., 256MB RAM Quad-core Intel Xeon 3.0GHz, 6GB RAM kyle Senior Design May08-24 4/30/08
Testing Methodology Input files Two phases Six were selected Execute, verify, and benchmark Aggregation and graphing of data kyle Senior Design May08-24 4/30/08
Benchmark Results infile.orig Code revision 4-Way 3.0GHz Machine (seconds) X Speedup PlayStation 3 (seconds) dnapenny_orig 823.568 1 7793.915 dnapenny_slimmer 360.131 2.28685673 941.981 8.273962 parallel_dnapenny_1.0 221.432 3.71928177 780.867 9.9811043 supplement_spe_parallel_1SPE N/A 1111.471 7.0122522 supplement_spe_parallel_3SPE 443.521 17.572821 supplement_spe_parallel_6SPE 277.233 28.11323 supplement_parallel_vector_1SPE 260.952 29.867236 supplement_parallel_vector_3SPE 153.656 50.723141 supplement_parallel_vector_6SPE 130.59 59.682326 kyle Senior Design May08-24 4/30/08
Benchmarking Results (cont) kyle Senior Design May08-24 4/30/08
Earned Value Analysis Task Estimated Hours Actual Hours % Complete Budgeted Costs of Work Scheduled Problem Definition 100 100.5 100% $1,000.00 Technology and Implementation Considerations 36 37 $360.00 End-Product Design 20 17.5 $200.00 End-Product Prototype Implementation 320 272 $3,200.00 End-Product Testing 60 78.5 $600.00 End-Product Documentation 40 42 $400.00 End-Product Demonstration 48 35 $480.00 Project Reporting 140 99 $1,400.00 Total 764 681.5 $7,640.00 matt Senior Design May08-24 4/30/08
Earned Value Analysis (cont.) Task Budgeted Costs of Work Performed Actual Costs of Work Performed Cost Variance Cost Performance Index Problem Definition $1,000.00 $1,005.00 -$5.00 99.5% Technology and Implementation Considerations $360.00 $370.00 -$10.00 97.3% End-Product Design $200.00 $175.00 $25.00 114.3% End-Product Prototype Implementation $3,200.00 $2,720.00 $480.00 117.6% End-Product Testing $600.00 $785.00 -$185.00 76.4% End-Product Documentation $400.00 $420.00 -$20.00 95.2% End-Product Demonstration $350.00 $130.00 137.1% Project Reporting $1,400.00 $990.00 $410.00 141.4% Total $7,640.00 $6,815.00 $825.00 112.1% matt Senior Design May08-24 4/30/08
Lessons Learned Cell/B.E. is a unique programming challenge Many tools available to help understand poorly documented code bryan Senior Design May08-24 4/30/08
Conclusion Significant speedup achieved Surprised at the impact of hand vectorization Cell/B.E. is well suited for this type of application shannon Senior Design May08-24 4/30/08
Questions Everyone!111 Senior Design May08-24 4/30/08