Presentation is loading. Please wait.

Presentation is loading. Please wait.

BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Similar presentations


Presentation on theme: "BME 130 – Genomes Lecture 5 Genome assembly I The good old days."— Presentation transcript:

1 BME 130 – Genomes Lecture 5 Genome assembly I The good old days

2 Administrivia Homework 1 – on the website today, due Friday; homework policy Student-led paper discussion; choose groups and pick paper Guest lecture Friday – Bob Kuhn will demo the UCSC genome browser

3 Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses
Genomics in the news Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses Citation: Gilbert C, Feschotte C (2010) Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses. PLoS Biol 8(9): e doi: /journal.pbio

4 Figure 4.10 Genomes 3 (© Garland Science 2007)

5 Figure 4.10 part 1 of 2 Genomes 3 (© Garland Science 2007)

6 Figure 4.10 part 2 of 2 Genomes 3 (© Garland Science 2007)

7 Sequence assembly de novo reference- guided overlap layout consensus
Reference sequence s1 s5 s3 s4 s2 s6

8 Most CPU and memory demanding stage
de novo sequence assembly Most CPU and memory demanding stage overlap s1 s2 s3 s4 s5 s6 Phrap: “banded” alignment of reads around k-mer matches; tolerate alignment mismatches of low-quality bases Phusion: group reads sharing >= 11 k-mers of 17 bases Celera: k-mer seed and extend alignment of reads Arachne: 24-mer seed and extend alignment of reads newbler: flowgram similarities (?)

9 de novo sequence assembly
Generate alignments s1 s2 s3 s4 s5 s6 s5 s1 s5 Find connected components s1 s2 s2 s3 s3 s4 s6 s4 s6 Wide range of strategies for the layout stage, many using mate-pair information

10 de novo Sequence assembly
consensus PHRAP Consensus base is base with highest quality score Quality score for position is based on all reads quality scores s1 s5 s2 s3 PCAP/CAP3 Sum up quality scores for each base take base with highest sum Quality score for position: highest sum – all other sums s4 s6

11 Reference-guided sequence assembly Advantages (much) faster
Reference sequence Advantages (much) faster (much) less memory Disadvantages Indels/rearragements Lack of closely related reference Bias towards reference similarity Pop M et al., “Comparative Genome Assembly” Brief Bioinform Sep;5(3):

12 Why is this called a sequence gap and not a physical gap?
Figure 4.11a Genomes 3 (© Garland Science 2007)

13 Closing a physical gap means finding a physical clone to sequence that will span the gap

14 Genomic DNA is template for this PCR
Figure 4.11b Genomes 3 (© Garland Science 2007)

15 Chromosome walking (is slow)
Figure Genomes 3 (© Garland Science 2007)

16 PCR from clone library Insert 1 connects to who?
Figure Genomes 3 (© Garland Science 2007)

17 Figure 4.14 Genomes 3 (© Garland Science 2007)

18 Figure 4.15 Genomes 3 (© Garland Science 2007)

19 Figure 4.15a Genomes 3 (© Garland Science 2007)

20 Figure 4.15b Genomes 3 (© Garland Science 2007)

21 Figure 4.15c Genomes 3 (© Garland Science 2007)

22 Figure 4.15d Genomes 3 (© Garland Science 2007)

23 Assembly can by validated by mate-pair information
Figure Genomes 3 (© Garland Science 2007)

24 Figure 4.16a Genomes 3 (© Garland Science 2007)

25 Figure 4.16b Genomes 3 (© Garland Science 2007)

26 Figure 4.17a Genomes 3 (© Garland Science 2007)

27 Figure 4.17b Genomes 3 (© Garland Science 2007)

28 Figure 4.18 Genomes 3 (© Garland Science 2007)


Download ppt "BME 130 – Genomes Lecture 5 Genome assembly I The good old days."

Similar presentations


Ads by Google