Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 01 – Fragment assembly. DNA sequence data DNA sequence data is the motherlode of molecular biology. 10^10 base pairs. One human genome/year. It.

Similar presentations


Presentation on theme: "Class 01 – Fragment assembly. DNA sequence data DNA sequence data is the motherlode of molecular biology. 10^10 base pairs. One human genome/year. It."— Presentation transcript:

1 Class 01 – Fragment assembly

2 DNA sequence data DNA sequence data is the motherlode of molecular biology. 10^10 base pairs. One human genome/year. It is our portal to protein sequences. It is fast, cheap and reliable. How do we get it?

3 Where the fragments come from Make many copies of a chromosome, using pcr (polymerase chain reaction). Break it up into short pieces. (We can sequence short pieces only.) Reassemble the short pieces.

4 Simplest version Like a jigsaw puzzle, except that we match overlaps rather than adjacencies. Assume that the shortest assembled string (shortest superstring is the correct solution). We know the orientation of each fragment, and the approximate length of the correct answer. (Real world considerations.)

5 Toy example ACCGT CGTGC TTAC TACCGT

6 Solution --ACCGT-- ----CGTGC TTAC----- -TACCGT-- _________ TTACCGTGC

7 Real world complications This model is too optimistic to be realistic Problems: Errors is reading fragments Contamination (chimeras) Could come from either strand Repeats Inverted repeats

8 The coverage problem Incomplete coverage (leaving ‘contigs’) We may have complete coverage, but not know it (for sure!)

9 Shortest superstring problem (SSP) Input: A collection F of strings Output: A shortest possible string S s.t. for every f in F, S is a superstring of f. Theorem: SSP is NP-complete. Fact: approximation algorithms for SSP are of no known practical value

10

11 Does motivation trump solution? Biologist: ‘Find an efficient algorithm which solves my problem.’ Computer scientist: ‘Give me a problem which I can solve efficiently.’ Culture clash: What happens when neither is possible?


Download ppt "Class 01 – Fragment assembly. DNA sequence data DNA sequence data is the motherlode of molecular biology. 10^10 base pairs. One human genome/year. It."

Similar presentations


Ads by Google