Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor position more obvious. To do this, click Slide Show on the top tool bar, then View show.
What to do for summer vacation?
Deadline, SUNday Feb 28!
Target, Monday Mar 1!
Deadline, ???
Deadline, FRIday Feb 26!
Global Viral Genome Project Deadline, whenever!
Learn more about… HHMI: BBSI: VCU-USF: GVGP: (News)
What is the sequence (5' to 3') represented by the gel? Myers et al SQ2 G A T C
What is the sequence (5' to 3') represented by the gel? Myers et al SQ2 G A T C
Dideoxy sequencing (= Sanger sequencing)
Dideoxy sequencing
What is the sequence (5' to 3') represented by the gel? G A T C Myers et al SQ2
What is the sequence (5' to 3') represented by the gel? G A T C ddC TCGTGTACATCGTAACACGGTTAAGTTCGTGTACATCGTAACACGGTTAAGT Myers et al SQ2
Sequencing process Drosophila genome (~100 million nt) Sequence it Technical limitation Reads limited to 100’s of nt
Sequencing process Drosophila genome (~100 million nt)... How many possible 500 nt fragments are there?
Sequencing process Drosophila genome (~100 million nt)... SAMPLE
Sequencing process Drosophila genome (~100 million nt) SAMPLE... How many 500 nt samples needed 100 million nt?
Sequencing process Drosophila genome (~100 million nt) SAMPLE... How many 500 nt samples needed 100 million nt? Is this enough? Oversampling … coverage?
Paint the wall Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ? How long will this take?
Paint the wall How long will this take? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?
Paint the wall How long will this take? 40 " 25 " 1 sq " Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?
Paint the wall How long will this take? 40 " 25 " 1000 paint balls? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?
Oversampling Completeness How much is painted with 1x oversampling? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ? What fraction won't be painted?
P(TT) = 1/2 x 1/2 = 1/4 Probability that two coins come up both tails Rule of multiplication intersection independent Gets T from first AND gets T from second Intersection of possibilities (Rule of multiplication)
P(at least 1 T) = 1/4 + 1/4 + 1/4 Probability that either of two coins comes up tails 1/2 x 1/2 = 1/4? Gets HT or TH or TT Union of possibilities (Rule of addition) 1/2 + 1/2 = 1?
P(at least 1 T) = 1/4 + 1/4 + 1/4 Probability that either of two coins comes up tails Gets HT or TH or TT Union of possibilities (Rule of addition) Rule of addition union mutually exclusive
P(at least 1 T) = 1 - 1/4 Probability that either of two coins does not comes up tails Probability(2 T) = 1 – Probability(NOT 2 T) Union of possibilities (Rule of complementation) Rule of complementation yin-yang Adds to 1
Sequencing process Drosophila genome (~100 million nt)... Focus on one nucleotide… What’s the probability that it’s covered by one read? What’s the probability that it’s covered by two reads? What’s the probability that it’s covered by 200,000 reads?
Problem Set 3, Problem 2 Statistics of mini-plasmid assembly
Why read pairs? Scaffolds? DNA Myers et al SQ6 Contig 1Contig 2
G A T C primer x 1000's plasmid insert ~2000 nt mates Myers et al SQ6 Why read pairs? Scaffolds?
... ~ 150,000 nt Bacterial Artificial CHROMOSOME mates Myers et al SQ6 Why read pairs? Scaffolds? P1-derived Artificial CHROMOSOME
Myers et al SQ6 Why read pairs? Scaffolds?
SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements: a. "We produced million reads that yielded 1.76 Gbp of sequence..." b. "...trillions of overlaps between reads are examined." c. "...to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates." Myers et al (2000)