Download presentation
Presentation is loading. Please wait.
Published byClaud Cross Modified over 9 years ago
1
Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor position more obvious. To do this, click Slide Show on the top tool bar, then View show.
3
What to do for summer vacation?
4
Deadline, SUNday Feb 28!
5
Target, Monday Mar 1!
6
Deadline, ???
7
Deadline, FRIday Feb 26!
8
Global Viral Genome Project Deadline, whenever!
9
Learn more about… HHMI: http://www.vcu.edu/csbc/hhmi/ BBSI: http://www.vcu.edu/csbc/bbsi/ VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm GVGP: http://biobike.csbc.vcu.edu (News)
10
What is the sequence (5' to 3') represented by the gel? Myers et al SQ2 G A T C
11
What is the sequence (5' to 3') represented by the gel? Myers et al SQ2 G A T C
12
Dideoxy sequencing (= Sanger sequencing)
13
Dideoxy sequencing
23
What is the sequence (5' to 3') represented by the gel? G A T C Myers et al SQ2
24
What is the sequence (5' to 3') represented by the gel? G A T C ddC TCGTGTACATCGTAACACGGTTAAGTTCGTGTACATCGTAACACGGTTAAGT Myers et al SQ2
25
Sequencing process Drosophila genome (~100 million nt) Sequence it Technical limitation Reads limited to 100’s of nt
26
Sequencing process Drosophila genome (~100 million nt)... How many possible 500 nt fragments are there?
27
Sequencing process Drosophila genome (~100 million nt)... SAMPLE
28
Sequencing process Drosophila genome (~100 million nt) SAMPLE... How many 500 nt samples needed 100 million nt? 100 000 000 500
29
Sequencing process Drosophila genome (~100 million nt) SAMPLE... How many 500 nt samples needed 100 million nt? Is this enough? Oversampling … coverage? 1 000 000 5
30
Paint the wall Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ? How long will this take?
31
Paint the wall How long will this take? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?
32
Paint the wall How long will this take? 40 " 25 " 1 sq " Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?
33
Paint the wall How long will this take? 40 " 25 " 1000 paint balls? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?
34
Oversampling Completeness How much is painted with 1x oversampling? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ? What fraction won't be painted?
35
P(TT) = 1/2 x 1/2 = 1/4 Probability that two coins come up both tails Rule of multiplication intersection independent Gets T from first AND gets T from second Intersection of possibilities (Rule of multiplication)
36
P(at least 1 T) = 1/4 + 1/4 + 1/4 Probability that either of two coins comes up tails 1/2 x 1/2 = 1/4? Gets HT or TH or TT Union of possibilities (Rule of addition) 1/2 + 1/2 = 1?
37
P(at least 1 T) = 1/4 + 1/4 + 1/4 Probability that either of two coins comes up tails Gets HT or TH or TT Union of possibilities (Rule of addition) Rule of addition union mutually exclusive
38
P(at least 1 T) = 1 - 1/4 Probability that either of two coins does not comes up tails Probability(2 T) = 1 – Probability(NOT 2 T) Union of possibilities (Rule of complementation) Rule of complementation yin-yang Adds to 1
39
Sequencing process Drosophila genome (~100 million nt)... Focus on one nucleotide… What’s the probability that it’s covered by one read? What’s the probability that it’s covered by two reads? What’s the probability that it’s covered by 200,000 reads?
40
Problem Set 3, Problem 2 Statistics of mini-plasmid assembly
41
Why read pairs? Scaffolds? DNA Myers et al SQ6 Contig 1Contig 2
42
G A T C primer x 1000's plasmid insert ~2000 nt mates Myers et al SQ6 Why read pairs? Scaffolds?
43
... ~ 150,000 nt Bacterial Artificial CHROMOSOME mates Myers et al SQ6 Why read pairs? Scaffolds? P1-derived Artificial CHROMOSOME
44
Myers et al SQ6 Why read pairs? Scaffolds?
45
SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements: a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence..." b. "...trillions of overlaps between reads are examined." c. "...to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates." Myers et al (2000)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.