Presentation is loading. Please wait.

Presentation is loading. Please wait.

GigAssembler. Genome Assembly: A big picture

Similar presentations


Presentation on theme: "GigAssembler. Genome Assembly: A big picture"— Presentation transcript:

1 GigAssembler

2 Genome Assembly: A big picture http://www.nature.com/scitable/content/anatomy-of-whole-genome-assembly-20429

3 GigAssembler – Preprocessing 1. Decontaminating & Repeat Masking. 2. Aligning of mRNAs, ESTs, BAC ends & paired reads against initial sequence contigs. psLayout → BLAT 3. Creating an input directory (folder) structure.

4 http://www.triazzle.comhttp://www.triazzle.com; The image from http://www.dangilbert.com/port_fun.htmlhttp://www.dangilbert.com/port_fun.html Reference: Jones NC, Pevzner PA, Introduction to Bioinformatics Algorithms, MIT press

5 RepBase + RepeatMasker

6 GigAssembler: Build merged sequence contigs (“rafts”)

7 Sequencing quality (Phred Score)

8 http://en.wikipedia.org/wiki/Phred_quality_score Base-calling Error Probability

9 GigAssembler: Build merged sequence contigs (“rafts”)

10

11 GigAssembler: Build sequenced clone contigs (“barges”)

12 GigAssembler: Build a “raft-ordering” graph

13 Add information from mRNAs, ESTs, paired plasmid reads, BAC end pairs: building a “bridge” Different weight to different data type: (mRNA ~ highest) Conflicts with the graph as constructed so far are rejected. Build a sequence path through each raft. Fill the gap with N. 100: between rafts 50,000: between bridged barges

14 Bellman-Ford algorithm http://compprog.wordpress.com/2007/11/29/one-source-shortest-path-the-bellman-ford-algorithm/

15 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 Take every edge and try to relax it (N – 1 times where N is the count of nodes)

16 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 Take every edge and try to relax it (N – 1 times where N is the count of nodes)

17 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 START Inf. Take every edge and try to relax it (N – 1 times where N is the count of nodes)

18 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START Inf. +7 ( → A) +6 ( → A) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

19 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START +7 ( → A) +6 ( → A) +2 ( → B) +4 ( → D) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

20 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START +7 ( → A) +2 ( → B) +4 ( → D) +2 ( → C) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

21 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START +7 ( → A) +4 ( → D) +2 ( → C) -2 ( → B) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

22 A B C DE +6 +8 +7 +9 +2 +5 -2 -3 -4 +7 0 START +7 ( → A) +4 ( → D) +2 ( → C) -2 ( → B) Answer: A-D-C-B-E

23 Next-generation sequencing

24 Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina

25 Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina

26 Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Roche/454

27 Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 SOLiD

28 An example of single molecule DNA sequencing, from Helicos (approx. 1 billion reads / run) Pushkarev, D., N.F. Neff, and S.R. Quake. Nat Biotechnol (2009) 27, 847-50 Harris, T.D., et al. Science (2008) 320, 106-9

29 Mapping program Trapnell C, Salzberg SL, Nat. Biotech., 2009

30 Two strategies in mapping Trapnell C, Salzberg SL, Nat. Biotech., 2009


Download ppt "GigAssembler. Genome Assembly: A big picture"

Similar presentations


Ads by Google