Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis.

Similar presentations


Presentation on theme: "Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis."— Presentation transcript:

1 Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis

2

3 E. coli: What makes it kill? Escherichia coli...... very small lab rats Courtesy of Kent State University Microbiology

4 E. coli: What makes it kill? Escherichia coli... haemorrhagic colitis

5 E. coli: What makes it kill? E. coli K12E. coli O157:H7 TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA

6 How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...

7 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...

8 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

9 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

10 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

11 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

12 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... CCCACGCCTAT

13 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... CCCACGCCTAT

14 E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...

15 E. coli O157:H7 E. coli K12

16 E. coli O157:H7 E. coli K12 O-Islands

17 Prochlor ss120 Prochlor. MED4 Prochlorococcus SS120 Prochlorococcus MED4 (100 nuc)

18 Prochlor ss120 Prochlor. MED4 Prochlorococcus SS120 Prochlorococcus MED4 (25 nuc)

19 Nature of Pathogenicity Islands Horizontal transfer of foreign DNA E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...

20 How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Deat h General transduction

21 How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Life!

22 How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Life!

23 How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Special transduction

24 How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related

25 How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons

26 How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons How to find orthologous protein?

27 How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons How to find corresponding protein?

28 X X X X X X X Yeast E. coli Anabaena Methanobacter

29 How to find corresponding protein? X X X X X X X Yeast E. coli Anabaena Methanobacter All similar protein? Most related by common descent? Orthologs Orthologs Paralogs

30 How to find corresponding protein? Most related by common descent? All similar protein? Orthologs Paralogs Blast E-value threshold Organism X Organism Y

31 How to find corresponding protein? Most related by common descent? Orthologs Blast E-value threshold Organism Y Organism X Organism Y Defined by bidirectional Blast hit

32 How to find corresponding protein? PROTEINS-SIMILAR-TO ORTHOLOG-OF COMMON-ORTHOLOGS-OF

33 Nature of Pathogenicity Islands Horizontal transfer of foreign DNA E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...

34 Nature of Pathogenicity Islands Horizontal transfer of foreign DNA

35 Nature of Pathogenicity Islands Nucleotide frequencies comparisons BaseSequence1Sequence2Total A1,000 6001,600 C 1,0008001,800 G1,0007001,700 T1,0009001,900 Total4,0003,0007,000 Nucleotide Count

36 Nucleotide frequencies to detect foreign genes 1. Find nucleotide frequencies of native genes 2. Find nucleotide frequencies of test gene 3. Compare frequencies 4. How likely differences arose by chance? Chi-squared analysis

37 Result: 705 purple 224 white = 929 plantsResult: 698 purple 231 white = 929 plantsResult: 688 purple 241 white = 929 plantsResult: 710 purple 219 white = 929 plantsResult: 695 purple 234 white = 929 plantsResult: 702 purple 227 white = 929 plants Where does  2 come from? A million repetitions of Mendel’s experiment Create a million universes -- purple:white on average = 3:1

38 200,000 repetitions Where does  2 come from? A million repetitions of Mendel’s experiment

39 500,000 repetitions Where does  2 come from? A million repetitions of Mendel’s experiment

40 1,000,000 repetitions Why is it that the two dotted lines are on opposite sides of the mean?

41 Where does  2 come from? A million repetitions of Mendel’s experiment 1,000,000 repetitions What’s the most likely result? How often does it occur?

42 Deviation from Expectation Two example experiments Why is there shading on both sides of the curve? The farther away O from E, the smaller/larger the shaded area?

43 Steps in Performing a Chi 2 Test Determine the expected values for the experiment Model: 3 purple : 1 white flower Total counted: 929 Purple = 75% of 929 = 696.75 White = 25% of 929 = 232.25 Calculate the squares of the deviations Chi 2 = Sum of (O - E) 2 / E Chi 2 = (705 - 696.75) 2 /696.75 + (224 - 232.25) 2 /232.25 ~8 2 / 700 + ~8 2 / 230 ~0.09 ~0.3 Chi 2 = approx 0.39 (actually = 0.37)

44 Steps in Performing a Chi 2 Test Determine the degrees of freedom What was the experiment? - Count 929 flowers a million times Ask: purple? (if not, then white) Look up probability for  2 value  2 = 0.30 80% > P > 50%. Call it ~60% Therefore ONE degree of freedom

45 Steps in Performing a Chi 2 Test P ~60% Draw a conclusion The result has a 50% chance of being correctThe hypothesis has a 50% chance of being correct60% of the time, Mendel’s result or worse would have arisen by chance if purple:white truly occurs in a 3:1ratio.

46 Deviation from Expectation Two example experiments Study Question 20: What if Mendel had counted not 929 but 929,000 plants -- what does the curve and shading look like then? (d still = 29) P =.50P = ???

47 Interpretation of Chi-Square Does a high P value indicate the hypothesis is correct? Does a low P value indicate the hypothesis is incorrect?

48 Bag of Marbles 1000’s of marbles! 50% red, 50% blue Guaranteed!

49 Test Claim of 50%:50% 41 marbles 59 marbles 100 marbles TOTAL Is their claim correct? How to tell how close is close enough?

50  2 Test of Claim Chi 2 = Sum of (O - E) 2 / E Chi 2 = (53 - 50) 2 /50 + (47 - 50) 2 /50 9 / 50 + 9 / 50 18/50 0.36 P = ? P = ~60%


Download ppt "Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis."

Similar presentations


Ads by Google