Download presentation
Presentation is loading. Please wait.
1
GigAssembler
2
Genome Assembly: A big picture http://www.nature.com/scitable/content/anatomy-of-whole-genome-assembly-20429
3
GigAssembler – Preprocessing 1. Decontaminating & Repeat Masking. 2. Aligning of mRNAs, ESTs, BAC ends & paired reads against initial sequence contigs. psLayout → BLAT 3. Creating an input directory (folder) structure.
4
http://www.triazzle.comhttp://www.triazzle.com; The image from http://www.dangilbert.com/port_fun.htmlhttp://www.dangilbert.com/port_fun.html Reference: Jones NC, Pevzner PA, Introduction to Bioinformatics Algorithms, MIT press
5
RepBase + RepeatMasker
6
GigAssembler: Build merged sequence contigs (“rafts”)
7
Sequencing quality (Phred Score)
8
http://en.wikipedia.org/wiki/Phred_quality_score Base-calling Error Probability
9
GigAssembler: Build merged sequence contigs (“rafts”)
11
GigAssembler: Build sequenced clone contigs (“barges”)
12
GigAssembler: Build a “raft-ordering” graph
13
Add information from mRNAs, ESTs, paired plasmid reads, BAC end pairs: building a “bridge” Different weight to different data type: (mRNA ~ highest) Conflicts with the graph as constructed so far are rejected. Build a sequence path through each raft. Fill the gap with N. 100: between rafts 50,000: between bridged barges
14
Bellman-Ford algorithm http://compprog.wordpress.com/2007/11/29/one-source-shortest-path-the-bellman-ford-algorithm/
15
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 Take every edge and try to relax it (N – 1 times where N is the count of nodes)
16
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 Take every edge and try to relax it (N – 1 times where N is the count of nodes)
17
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 START Inf. Take every edge and try to relax it (N – 1 times where N is the count of nodes)
18
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START Inf. +7 ( → A) +6 ( → A) Take every edge and try to relax it (N – 1 times where N is the count of nodes)
19
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START +7 ( → A) +6 ( → A) +2 ( → B) +4 ( → D) Take every edge and try to relax it (N – 1 times where N is the count of nodes)
20
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START +7 ( → A) +2 ( → B) +4 ( → D) +2 ( → C) Take every edge and try to relax it (N – 1 times where N is the count of nodes)
21
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 Find the shortest path to all nodes. -4 +7 0 START +7 ( → A) +4 ( → D) +2 ( → C) -2 ( → B) Take every edge and try to relax it (N – 1 times where N is the count of nodes)
22
A B C DE +6 +8 +7 +9 +2 +5 -2 -3 -4 +7 0 START +7 ( → A) +4 ( → D) +2 ( → C) -2 ( → B) Answer: A-D-C-B-E
23
Next-generation sequencing
24
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina
25
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina
26
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Roche/454
27
Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 SOLiD
28
An example of single molecule DNA sequencing, from Helicos (approx. 1 billion reads / run) Pushkarev, D., N.F. Neff, and S.R. Quake. Nat Biotechnol (2009) 27, 847-50 Harris, T.D., et al. Science (2008) 320, 106-9
29
Mapping program Trapnell C, Salzberg SL, Nat. Biotech., 2009
30
Two strategies in mapping Trapnell C, Salzberg SL, Nat. Biotech., 2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.