Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next.

Similar presentations


Presentation on theme: "Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next."— Presentation transcript:

1 Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor position more obvious. To do this, click Slide Show on the top tool bar, then View show.

2

3 What to do for summer vacation?

4 Deadline, SUNday Feb 28!

5 Target, Monday Mar 1!

6 Deadline, ???

7 Deadline, FRIday Feb 26!

8 Global Viral Genome Project Deadline, whenever!

9 Learn more about… HHMI: http://www.vcu.edu/csbc/hhmi/ BBSI: http://www.vcu.edu/csbc/bbsi/ VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm GVGP: http://biobike.csbc.vcu.edu (News)

10 What is the sequence (5' to 3') represented by the gel? Myers et al SQ2 G A T C

11 What is the sequence (5' to 3') represented by the gel? Myers et al SQ2 G A T C

12 Dideoxy sequencing (= Sanger sequencing)

13 Dideoxy sequencing

14

15

16

17

18

19

20

21

22

23 What is the sequence (5' to 3') represented by the gel? G A T C Myers et al SQ2

24 What is the sequence (5' to 3') represented by the gel? G A T C ddC TCGTGTACATCGTAACACGGTTAAGTTCGTGTACATCGTAACACGGTTAAGT Myers et al SQ2

25 Sequencing process Drosophila genome (~100 million nt) Sequence it Technical limitation Reads limited to 100’s of nt

26 Sequencing process Drosophila genome (~100 million nt)... How many possible 500 nt fragments are there?

27 Sequencing process Drosophila genome (~100 million nt)... SAMPLE

28 Sequencing process Drosophila genome (~100 million nt) SAMPLE... How many 500 nt samples needed  100 million nt? 100 000 000 500

29 Sequencing process Drosophila genome (~100 million nt) SAMPLE... How many 500 nt samples needed  100 million nt? Is this enough? Oversampling … coverage? 1 000 000 5

30 Paint the wall Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ? How long will this take?

31 Paint the wall How long will this take? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?

32 Paint the wall How long will this take? 40 " 25 " 1 sq " Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?

33 Paint the wall How long will this take? 40 " 25 " 1000 paint balls? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ?

34 Oversampling Completeness How much is painted with 1x oversampling? Study Question 8 & 9 "oversampling"? "coverage"? Shotgun sequencing ? What fraction won't be painted?

35 P(TT) = 1/2 x 1/2 = 1/4 Probability that two coins come up both tails Rule of multiplication intersection independent Gets T from first AND gets T from second Intersection of possibilities (Rule of multiplication)

36 P(at least 1 T) = 1/4 + 1/4 + 1/4 Probability that either of two coins comes up tails 1/2 x 1/2 = 1/4? Gets HT or TH or TT Union of possibilities (Rule of addition) 1/2 + 1/2 = 1?

37 P(at least 1 T) = 1/4 + 1/4 + 1/4 Probability that either of two coins comes up tails Gets HT or TH or TT Union of possibilities (Rule of addition) Rule of addition union mutually exclusive

38 P(at least 1 T) = 1 - 1/4 Probability that either of two coins does not comes up tails Probability(2 T) = 1 – Probability(NOT 2 T) Union of possibilities (Rule of complementation) Rule of complementation yin-yang Adds to 1

39 Sequencing process Drosophila genome (~100 million nt)... Focus on one nucleotide… What’s the probability that it’s covered by one read? What’s the probability that it’s covered by two reads? What’s the probability that it’s covered by 200,000 reads?

40 Problem Set 3, Problem 2 Statistics of mini-plasmid assembly

41 Why read pairs? Scaffolds? DNA Myers et al SQ6 Contig 1Contig 2

42 G A T C primer x 1000's plasmid insert ~2000 nt mates Myers et al SQ6 Why read pairs? Scaffolds?

43 ... ~ 150,000 nt Bacterial Artificial CHROMOSOME mates Myers et al SQ6 Why read pairs? Scaffolds? P1-derived Artificial CHROMOSOME

44 Myers et al SQ6 Why read pairs? Scaffolds?

45 SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements: a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence..." b. "...trillions of overlaps between reads are examined." c. "...to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates." Myers et al (2000)


Download ppt "Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next."

Similar presentations


Ads by Google