A Proposed Solution to the Short Read Reassembly Problem Carl Ebeling and Corey Olson.

Slides:

Advertisements

Similar presentations

IP Addressing Terminology

Advertisements

Which table represents a function?

Fill in missing numbers or operations

1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Chapter 1 The Study of Body Function Image PowerPoint

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.

Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.

Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.

Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×

Multiplication Facts Review. 6 x 4 = 24 5 x 5 = 25.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.

Patterns and sequences We often need to spot a pattern in order to predict what will happen next. In maths, the correct name for a pattern of numbers is.

£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.

Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:

Solve Multi-step Equations

Markov models and applications

Trusted Symbol of the Digital Economy 1 Bill Holmes – VP Marketing ID Platform - Smart Cards.

SE-292 High Performance Computing

Taking CUDA to Ludicrous Speed Getting Righteous Performance from your GPU 1.

Factoring Quadratics — ax² + bx + c Topic

1 RAID Overview n Computing speeds double every 3 years n Disk speeds cant keep up n Data needs higher MTBF than any component in system n IO.

Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.

Chapter 4 Memory Management Basic memory management Swapping

1 Overview Assignment 4: hints Memory management Assignment 3: solution.

Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis

Memory Management.

The table shows the top scores for girls in barrel racing at the 2004 National High School Rodeo finals. The data can be presented in a table or a spreadsheet.

O X Click on Number next to person for a question.

Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

Created by Susan Neal $100 Fractions Addition Fractions Subtraction Fractions Multiplication Fractions Division General $200 $300 $400 $500 $100 $200.

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.

© 2012 National Heart Foundation of Australia. Slide 2.

Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

Multiply Binomials (ax + b)(cx +d) (ax + by)(cx +dy)

Addition 1’s to 20.

Factoring Grouping (Bust-the-b) Ex. 3x2 + 14x Ex. 6x2 + 7x + 2.

Test B, 100 Subtraction Facts

A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.

Chapter 10: The Traditional Approach to Design

Systems Analysis and Design in a Changing World, Fifth Edition

Flowchart to factor Factor out the Great Common Factor

SE-292 High Performance Computing

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Bottoms Up Factoring. Start with the X-box 3-9 Product Sum

O X Click on Number next to person for a question.

PSSA Preparation.

X-box Factoring. X- Box 3-9 Product Sum Factor the x-box way Example: Factor 3x 2 -13x (3)(-10)= x 2x 3x 2 x-5 3x +2.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Operating Systems.

11/19/2002Yun (Helen) He, SC20021 MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi- Dimensional Array.

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Outline More exhaustive search algorithms Today: Motif finding

Presentation transcript:

A Proposed Solution to the Short Read Reassembly Problem Carl Ebeling and Corey Olson

Outline Background Indexing Solution Architecture

Motivation Solexa/Illumina and SOLiD ~billions of base pairs in hours 100s of millions of short reads (30-70 bp) read in parallel Computational cost rising Needed: hardware solution to improve speed and usability

Background Goal: quickly align millions of reads to the reference genome Read errors and SNPs prevent simple indexing Solutions Brute force comparison of all reads to reference Indexed-based using seeds Burroughs-Wheeler Transform

Index Based Solution Reference Index Table (RIT) Maps all seeds to positions in the reference Read Position Table (RPT) Maps reads to regions in the reference for comparison Smith Waterman Comparison Stream reference genome into SW units for scoring of reads

RIT Creation CATGCTAT 65 Mask SeedCATGCTAT CATGCTAA CATGCTAC CAT_GC_TGAT CATGCTAG CATGCCGG Note: first column is number of entries

RPT Creation :63 RPT Read 23 Mask CAT_GC_T_ATSeedATACATTGCGTAATCG 0:31 64:95 CATGCTAT 23 96: CATGCTAT RIT CATGCTAA CATGCTAC CATGCTAG CATGCCGG 128:

Read Scoring SW Unit TAGTGTGATCGAA :63 RPT 0:31 64:95 96: :159 Read #6:

Buckets Buckets combine hits for a read along the reference Reduces number of SW units required Optimal bucket length unknown

Entries Per Location in RIT N = number of base pairs in reference genome k = characters in the seed (#1s in the mask) Note: Each entry in RIT ~ 4 Bytes, 2^2k total locations, N entries N=31,k=11: RIT = 2^31*2^2 = 8GB N=32,k=14: RIT = 2^32*2^2 = 16GB

Entries in RPT R = number of reads Seff = effective number of seeds per read Ex: R=2^27, Seff=2: 2^20 * 2048 * 4 = 8GB

Entries per Bucket b = bucket size Note: this determines the number of SW units required

Architecture Memory Required 8 GB for RIT, 8 GB for RPT Creation of RIT and RPT is random access Access time can be masked with buffering and multiple memory banks High bandwidth communication required between FPGAs

RIT Creation Algorithm 1.Move to the next reference character 2.Generate the next seed with the mask 3.Using seed as address, open DRAM row a)Read current array length b)Increment array length and write back c)Write reference position to array[length]

Memory Distribution RIT AA.. AC.. AG.. AT.. CA.. CC.. CG.. CT.. RIT TA.. TC.. TG.. TT.. RIT Distributed by Seed RPT part 0 RPT Buckets Partitioned across memory modules by reads RPT part 1 RPT part 2 RPT part 3 RPT part 4 RPT part 5 RPT part 6 RPT part 7 RPT part n-4 part n-3 part n-2 part n-1

RPT Creation Algorithm 1.Clear the bucket set P in the FPGA assigned to the read 2.For each seed in the read a)Using seed as address, read all reference positions from RIT b)Add the current read to the bucket associated with each position 3.After all seeds in read, for each bucket in P a)Using the reference position as address, read the current array length b)Increment the array length and write back c)Write the read ID to array[length]

Reassembly Process with Architecture Reference streamed from host source Reads loaded from RPT into SW units at start comparison point Max score and location for each read recorded by SW unit at end comparison point

Active SW Units at one time Lr = Read Length e = error window size

Performance Estimates Construction of RIT = 16 seconds Assuming 128MHz and process 1 reference character per clock Construction of RPT = 10 minutes Assuming R=130M, L R =64, N=2^31, k=14, 4 FPGAs Reassembly Phase = 16 seconds Assuming 128MHz, N=2^31