Reconfigurable Computing (EN2911X, Fall07)

Reconfigurable Computing (EN2911X, Fall07)
Lecture 18: Application-Driven Hardware Acceleration (4/4) Prof. Sherief Reda Division of Engineering, Brown University

Status We have covered popular application-driven hardware acceleration using reconfigurable computing FFT for signal and image processing as an example of divide and conquer algorithms Speech recognition applications Viterbi algorithm for digital communication as an example of dynamic programming algorithms This lecture we overview some of the algorithms for bioinformatics

Quick introduction to molecular biology & bioinformatics

DNA Can be thought of as the “blueprint” for an organism
Composed of small molecules called nucleotides four different nucleotides distinguished by the four bases: adenine (A), cytosine (C), guanine (G) and thymine (T) DNA is digital information A single strand of DNA can be thought of as a string composed of the four letters: A, C, G, T ACGTTCTA DNA molecules usually consist of two strands arranged in a double helix structure where A bonds to T and C bonds to G

Genes Genes are the basic units of heredity
A gene is a sequence of bases that carries the information required for constructing a particular protein. Such a gene is said to encode a protein The human genome comprises ~ 20K-25K genes Those genes encode > 100,000 proteins

Proteins a folded protein structure amino acids Proteins perform most life functions and even make up the majority of cellular structures. Proteins are large, complex molecules made up of smaller subunits called amino acids. Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell. Proteins can be thought of as a string composed from a 20-character alphabet

Central dogma of molecular biology
RNA is like DNA except that they are usually single stranded and the base uracil (U) is used in place of thymine (T) a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U

Translation

Translation There are possible 6 reading frames in translating DNA sequences into proteins. In many cases, FPGAs are used to translate a DNA sequence into the 6 frames in parallel and then concurrently apply any subsequent processing

DNA string alignment A sequence alignment is a way of arranging the primary sequences of DNA (or RNA or protein) to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as insertion or deletion mutations introduced in one or both lineages in the time since they diverged from one another. At each position, one of three cases can occur: A match occurs when the same character is present in both strings A mismatch, or substitution, when there are two different characters A gap, where is an insertion of one character in only one string, or symmetrically a deletion in the other string How can we find the best alignment between two DNA strings?

Finding the best global alignment
[Figures from slides from Bioinformatics Applications by D. Lavenier and M. Giraud] Costs: +4 for a match -2 for a mismatch -3 for a gap Needleman and Wunsch (NW) dynamic programming algorithm

Local alignment: finding the most similar subsequences
Costs: +4 for a match -2 for a mismatch -3 for a gap Smith and Waterman (SW algorithm)

Dynamic programming advantage on FPGAs
All cells on a same anti-diagonal can be computed simultaneously What is the runtime on a general purpose CPU? What is the runtime on an FPGA?

Required number of computational cells

Examples of commercial products
Bioceleration Ltd. Each BioXL/H board contains eight FPGA modules and 128MB of global memory. Each of the modules is programmed to calculate four matrix cells per clock cycle (for the Smith-Waterman algorithm). An eight-board BioXL/H executes these applications at a speed of 6 billion matrix cells per second. The clock rate of the system is 25-33MHz (programmable). Examples of applications supported: Smith-Waterman algorithm Translation of nucleic acid sequences to 6 reading frames and search frame into an amino acid database

More examples: TimeLogic
“CodeQuest is a biocomputing workstation that processes large genomics searches and sophisticated informatics workflows. Using its FPGA-based DeCypher Engines, the quad-core CodeQuest workstation speeds Tera-BLAST, Smith-Waterman, Hidden Markov Model (HMM) and gene modeling searches at the speed of a mid-sized cluster.” “It brings several fold the performance of a 64-CPU cluster, yet costs less than 10 CPUs”

Reconfigurable Computing (EN2911X, Fall07)

Similar presentations

Presentation on theme: "Reconfigurable Computing (EN2911X, Fall07)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reconfigurable Computing (EN2911X, Fall07)

Similar presentations

Presentation on theme: "Reconfigurable Computing (EN2911X, Fall07)"— Presentation transcript:

Similar presentations

About project

Feedback