Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis
E. coli: What makes it kill? Escherichia coli very small lab rats Courtesy of Kent State University Microbiology
E. coli: What makes it kill? Escherichia coli... haemorrhagic colitis
E. coli: What makes it kill? E. coli K12E. coli O157:H7 TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA
How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... CCCACGCCTAT
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... CCCACGCCTAT
E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...
E. coli O157:H7 E. coli K12
E. coli O157:H7 E. coli K12 O-Islands
Prochlor ss120 Prochlor. MED4 Prochlorococcus SS120 Prochlorococcus MED4 (100 nuc)
Prochlor ss120 Prochlor. MED4 Prochlorococcus SS120 Prochlorococcus MED4 (25 nuc)
Nature of Pathogenicity Islands Horizontal transfer of foreign DNA E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Deat h General transduction
How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Life!
How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Life!
How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Special transduction
How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related
How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons
How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons How to find orthologous protein?
How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons How to find corresponding protein?
X X X X X X X Yeast E. coli Anabaena Methanobacter
How to find corresponding protein? X X X X X X X Yeast E. coli Anabaena Methanobacter All similar protein? Most related by common descent? Orthologs Orthologs Paralogs
How to find corresponding protein? Most related by common descent? All similar protein? Orthologs Paralogs Blast E-value threshold Organism X Organism Y
How to find corresponding protein? Most related by common descent? Orthologs Blast E-value threshold Organism Y Organism X Organism Y Defined by bidirectional Blast hit
How to find corresponding protein? PROTEINS-SIMILAR-TO ORTHOLOG-OF COMMON-ORTHOLOGS-OF
Nature of Pathogenicity Islands Horizontal transfer of foreign DNA E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...
Nature of Pathogenicity Islands Horizontal transfer of foreign DNA
Nature of Pathogenicity Islands Nucleotide frequencies comparisons BaseSequence1Sequence2Total A1, ,600 C 1, ,800 G1, ,700 T1, ,900 Total4,0003,0007,000 Nucleotide Count
Nucleotide frequencies to detect foreign genes 1. Find nucleotide frequencies of native genes 2. Find nucleotide frequencies of test gene 3. Compare frequencies 4. How likely differences arose by chance? Chi-squared analysis
Result: 705 purple 224 white = 929 plantsResult: 698 purple 231 white = 929 plantsResult: 688 purple 241 white = 929 plantsResult: 710 purple 219 white = 929 plantsResult: 695 purple 234 white = 929 plantsResult: 702 purple 227 white = 929 plants Where does 2 come from? A million repetitions of Mendel’s experiment Create a million universes -- purple:white on average = 3:1
200,000 repetitions Where does 2 come from? A million repetitions of Mendel’s experiment
500,000 repetitions Where does 2 come from? A million repetitions of Mendel’s experiment
1,000,000 repetitions Why is it that the two dotted lines are on opposite sides of the mean?
Where does 2 come from? A million repetitions of Mendel’s experiment 1,000,000 repetitions What’s the most likely result? How often does it occur?
Deviation from Expectation Two example experiments Why is there shading on both sides of the curve? The farther away O from E, the smaller/larger the shaded area?
Steps in Performing a Chi 2 Test Determine the expected values for the experiment Model: 3 purple : 1 white flower Total counted: 929 Purple = 75% of 929 = White = 25% of 929 = Calculate the squares of the deviations Chi 2 = Sum of (O - E) 2 / E Chi 2 = ( ) 2 / ( ) 2 / ~8 2 / ~8 2 / 230 ~0.09 ~0.3 Chi 2 = approx 0.39 (actually = 0.37)
Steps in Performing a Chi 2 Test Determine the degrees of freedom What was the experiment? - Count 929 flowers a million times Ask: purple? (if not, then white) Look up probability for 2 value 2 = % > P > 50%. Call it ~60% Therefore ONE degree of freedom
Steps in Performing a Chi 2 Test P ~60% Draw a conclusion The result has a 50% chance of being correctThe hypothesis has a 50% chance of being correct60% of the time, Mendel’s result or worse would have arisen by chance if purple:white truly occurs in a 3:1ratio.
Deviation from Expectation Two example experiments Study Question 20: What if Mendel had counted not 929 but 929,000 plants -- what does the curve and shading look like then? (d still = 29) P =.50P = ???
Interpretation of Chi-Square Does a high P value indicate the hypothesis is correct? Does a low P value indicate the hypothesis is incorrect?
Bag of Marbles 1000’s of marbles! 50% red, 50% blue Guaranteed!
Test Claim of 50%:50% 41 marbles 59 marbles 100 marbles TOTAL Is their claim correct? How to tell how close is close enough?
2 Test of Claim Chi 2 = Sum of (O - E) 2 / E Chi 2 = ( ) 2 /50 + ( ) 2 /50 9 / / 50 18/ P = ? P = ~60%