Presentation is loading. Please wait.

Presentation is loading. Please wait.

High performance bioinformatics

Similar presentations


Presentation on theme: "High performance bioinformatics"— Presentation transcript:

1 High performance bioinformatics
Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams High performance bioinformatics

2 Problem/Need Statement
Current ways to solve Bioinformatics problems are either slow or very expensive. There is a need for a way to reduce cost and still deliver high performance in a computer system that can solve Bioinformatics problems.

3 What is Bioinformatics?
Genetic sequencing. Massive amounts of data. Simple operations but many of them. Perfect for distributed computing.

4 Proposed Solution Use a cluster of PS3s with their embedded Cell processors.

5 Cell Broadband Engine Has 1 central PowerPC based PPE.
Has 8 surrounding SPEs. The 8 SPEs are connected via the element interconnect bus.

6 Cell Broadband Engine

7 Functional requirements
FR1. Ported applications shall run on the Cell B.E. FR2. The results returned shall be the same as the original program. FR3. The applications shall return their runtime. FR4. The applications shall execute in parallel on multiple Cell B.E.s.

8 Non-Functional Requirements
NF1. The Cells shall all run on the Linux OS. NF2. The resulting runtimes of the ported applications shall be faster than on the original applications. NF3. The ported application shall be coded in the C language.

9 Operating Environment
Use Fedora 9 OS as it is currently supported by the Cell SDK 3.1. Uses the command line for user interface. Use the IBM XLC compiler and/or the current GCC compiler.

10 Market Survey Results of the survey point to a huge speed up of computationally intensive programs. Dr. Gaurav Khanna at the University of Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer. Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.

11 Deliverables The Source Code. Compiled Executable.
Runtime Comparisons. Project Final Report. Project Poster. Project Final Presentation.

12 Work Breakdown Structure
Port Apps to Cluster PS3s Problem Definition Research Cell/B.E Research Bioperf Suite Research Distributed Parallel Algorithms Research Previously Done Work End Product Design Design Requirements Design Process Design Documents Considerations and Selections Decide Which Linux to Install Decide which applications to port End Product Implementation Hardware Implementation Prototyping Implementation Software Implementation End Product Testing Ensure Correctness of Output Results Benchmarking Final Documentation and Demonstration Create Final Report Create Project Poster Prepare for Presentation Work Breakdown Structure

13 Costs Time Equipment Approximately 555 man hours total.
Freely donated. Total cost $0. Equipment 3 PS3s Crossbar router Provided for us by client. Total cost $0.

14 Resource Requirements
3 PlayStation 3s. High performance network switch. Books on distributed computing on Cell. Time.

15 Work Schedule Gant chart

16 Risk Assessment Slow network speed. Software support. Limited RAM.
Hardware Failure. Lower quality entertainment hardware. Limited prior experience. Software development schedule.

17 Design Further divide the application into multiple threads for SPE execution on multiple PS3s, alter the functional logic, and vectorize the code where possible.

18 Software Decomposition Diagram

19 System Requirements SR1. The system shall allow the user to input multiple DNA sequences in FASTA format through a file interface. SR2. The system shall output all of the most parsimonious trees implied by the input data to the screen. SR3. The system shall share computational work among the PPE and SPEs available to each client/server process. SR4. The front-end shall share computational work with available back-end processes. SR5. The front-end shall be able to connect to at least 2 back-end processes via a high performance router.

20 System Analysis The key is data flow. Broken into 3 stages.
DNA sequences distributed to the PPEs down to the SPEs Each SPE searches every possible parsimony tree for the best possible score using a branch and bound heuristic. Finally the results are aggregated back to the main PPE and the results output.

21 Specifications Input Output DNA sequence files in FASTA format.
Runtime of the application. The most parsimonious phylogenetic tree. The parsimony score of the phylogenetic tree.

22 Specifications User Interface No changes to the user interface.
Uses a command line interface.

23 Specifications Hardware 3 PlayStation 3s
High performance Cross-Bar network switch.

24 Specifications Software
Fedora 9 with Linux kernel for the Power PC IBM Cell SDK 3.1 IBM XLC 9.0 and GCC 4.3 compilers. DNAPenny 3.6. Bioperf Suite

25 Specifications Testing
Compare benchmarked runtimes over several iterations and inputs to get averages. Compare these runtimes with previous group’s runtimes on single Cell processor. Compare these runtimes with previous group’s runtimes on a high performance server. Quad-core Intel Xeon 3.0GHz, 6GB RAM.

26 Acknowledgements May08-24 group Bioperf developers Kyle Byerly
Shannon McCormick Matt Rohlf Bryan Venteicher Bioperf developers David A. Bader, Georgia Tech Yue Li, Univ. of Florida Tao Li, Univ. of Florida Vipin Sachdeva, IBM Austin

27 Questions?

28 Previous Results and Projected Results
Code revision 4-Way 3.0GHz Machine (seconds) X Speedup PlayStati on 3 (seconds ) X Speedu p dnapenny_orig 1 dnapenny_slimmer parallel_dnapenny_1.0 supplement_spe_parall el_1SPE supplement_spe_parall el_3SPE supplement_spe_parall el_6SPE supplement_parallel_ve ctor_1SPE supplement_parallel_ve ctor_3SPE supplement_parallel_ve ctor_6SPE 130.59 Cluster with 3 PlayStations (Projected) ~54.8 ~

29 Summary Cost: $0. Equipment provided.
Time: 555 approximate man hours. Freely Donated. Results: 4x the performance of a similarly priced system.


Download ppt "High performance bioinformatics"

Similar presentations


Ads by Google