Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis.

Slides:



Advertisements
Similar presentations
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Advertisements

General Genetic Bio 221 Lab 6. Law of Independent Assortment (The "Second Law") The Law of Independent Assortment, also known as "Inheritance Law", states.
Chi-Square Test Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis.
Mendelian Genetics. Genes- genetic material on a chromosome that codes for a specific trait Genotype- the genetic makeup of the organism Phenotype- the.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Two Classes Meet the Bell Curve December 2004 MUPGRET Workshop.
Naked mole rats are a burrowing rodent
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Chi Square (X 2 ) Analysis Calculating the significance of deviation in experimental results.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Chi-Squared Test.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding.
Billions and Billions of Bases How does a biologist maintain a grip on reality?
Frog’s eye view of the jungle (time frozen) Push to restart time.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Lives of the Scientist Genetic Basis of Differentiation Events in time and space...
If post is spelled P-O- S-T and most is spelled M-O-S-T, how do you spell the word for what you put in the toaster?
1. How does conjugation work? Sex in Bacteria How do bacteria exchange DNA.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory. How.
Welcome to Introduction to Bioinformatics Wednesday, 13 April 2005 Rehash of Exam 1 (selected) Rehash of Exam 2 (selected) Discussion of DGPB, Chapter.
Chi square analysis Just when you thought statistics was over!!
Today: Chi squared and non- nuclear inheritance. Homologous pair of chromosomes Linkage can be used to determine distance.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Integrated Bioinformatics Nature of research articles Comparison of genomes – Scenario Regular expressions in Python Installing and running Blast How to.
State the ‘null hypothesis’ State the ‘alternative hypothesis’ State either one-tailed or two-tailed test State the chosen statistical test with reasons.
Welcome to Introduction to BioinformaticS Intro to Scenario 8 Identification of genes of foreign origin.
Chi-Square Analysis AP Biology.
MILLIONAIRE SCOREBOARD $100 $200 $300 $500 $1,000 $2,000 $4,000 $8,000 $16,000 $32,000 $64,000 $125,000 $250,000 $500,000 $1 MILLION Click the $ for.
 What is different between these 2 sequences? GGAATTCCTAGCAAT CCTTAAGGATCGTTA CTACGTGAGGAATTC GATGCACTCCTTAAG.
Lecture 11. The chi-square test for goodness of fit.
Gene Transfer. Gene transfer in bacteria There are three types of gene transfer 1.Transformation 2.Conjugation 3.Transduction.
16 Box Punnett Squares and Mendel’s Laws Using a Chi-Square Analysis to study inheritance patterns.
Analyzing Data  2 Test….”Chi” Square. Forked-Line Method, F2 UuDd x UuDd 1/4 UU 1/2 Uu 1/4 uu 1/4 DD 1/2 Dd 1/4 dd 1/4 DD 1/2 Dd 1/4 dd 1/4 DD 1/2 Dd.
Did Mendel fake is data? Do a quick internet search and can you find opinions that support or reject this point of view. Does it matter? Should it matter?
DRAWING INFERENCES FROM DATA THE CHI SQUARE TEST.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding Today is the last class. Would.
AP Biology Heredity PowerPoint presentation text copied directly from NJCTL with corrections made as needed. Graphics may have been substituted with a.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
M & M Statistics: A Chi Square Analysis
Transduction.
Pattern Recognition and Gene Finding
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Genetics News Exam I Weekend help Summary for Exam II Lab
Analyzing Data c2 Test….”Chi” Square.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Analyzing Data c2 Test….”Chi” Square.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Analyzing Data c2 Test….”Chi” Square.
Chi-Square Analysis.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Inferential Stat Week 13.
Basic Local Alignment Search Tool
Genomic rearrangements of E
Introduction to Molecular Biology
UNIT V CHISQUARE DISTRIBUTION
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
S.M.JOSHI COLLEGE, HADAPSAR
20 May 2019 Chi2 Test For Genetics Help sheet.
Chi-Square Test A fundamental problem in Science is determining whether the experiment data fits the results expected. How can you tell if an observed.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

Welcome to Introduction to Bioinformatics Monday, 21 March 2005 Genome Comparison Coming attractions How to compare genomes Chi-squared analysis

E. coli: What makes it kill? Escherichia coli very small lab rats Courtesy of Kent State University Microbiology

E. coli: What makes it kill? Escherichia coli... haemorrhagic colitis

E. coli: What makes it kill? E. coli K12E. coli O157:H7 TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA TCTACTTATA TTCAATCCAC AGGGCTACAC AAGAGTCTGT TGAATGAACA CATACATGGT TTCTGTCTGC TCTGACCTCT GGCAGCTTTC TGGATTTCGG AACTCTAGCC TGCCCCACTC GAACCTTAGT GACTTCTGCT ATACCAAAGT CTCCGTAAAC CTCTAACATG ATGTCAGCAA TGAATAAACT TTGTTAAAGG TACAAATGAA AAGAGTTTAA AGTTAAAAAC GAATTGCAGT AAACCTGTAT GGTTACATGA ACTGCCTAAA TTATATATTT TAAGAAATTA ATTGCAATTA CCCCAGCTGT CATTAAAAAG AGGCAAATAC GACAGCACTG ACCCTCAAGA AGGCACCGGC GCTGAAATTC CGCTGAGAGC AGAGTGGTAC CCCTGCACCA GGTCTTTCCT GTGGGCACTG ATGAATGACT GAACGAACGA TTGAATGAAA

How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA...

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... GATAGATCCCC

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... CCCACGCCTAT

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA... CCCACGCCTAT

E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAACCACGCCTTGA...

E. coli O157:H7 E. coli K12

E. coli O157:H7 E. coli K12 O-Islands

Prochlor ss120 Prochlor. MED4 Prochlorococcus SS120 Prochlorococcus MED4 (100 nuc)

Prochlor ss120 Prochlor. MED4 Prochlorococcus SS120 Prochlorococcus MED4 (25 nuc)

Nature of Pathogenicity Islands Horizontal transfer of foreign DNA E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...

How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Deat h General transduction

How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Life!

How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Life!

How do differences arise between genomes? Infection Phage Bacterial chromosome Phage genome Lysogenic pathway Lytic pathway Phage genome Special transduction

How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related

How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons

How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons How to find orthologous protein?

How to compare genomes E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC... Differences in genome sequence Useful only if very related Differences in protein content Useful for even distant comparisons How to find corresponding protein?

X X X X X X X Yeast E. coli Anabaena Methanobacter

How to find corresponding protein? X X X X X X X Yeast E. coli Anabaena Methanobacter All similar protein? Most related by common descent? Orthologs Orthologs Paralogs

How to find corresponding protein? Most related by common descent? All similar protein? Orthologs Paralogs Blast E-value threshold Organism X Organism Y

How to find corresponding protein? Most related by common descent? Orthologs Blast E-value threshold Organism Y Organism X Organism Y Defined by bidirectional Blast hit

How to find corresponding protein? PROTEINS-SIMILAR-TO ORTHOLOG-OF COMMON-ORTHOLOGS-OF

Nature of Pathogenicity Islands Horizontal transfer of foreign DNA E. coli O157:H7 genome GATAGATCCCCACGCCTATAATGGCGCATAACACACTAAACTTGGGGTATTGAAGCAGTCGCCAAAGAGTGACCGGTCATCCTTCTCCGCTGCGAAATATCCTTCTTGTTGGCATACCACGCCTTGA... E. coli K12 genome GCGGAGCAAACTGGGCGTCTTTCGAGAACTAACAAATCCGATTGCGGGCTTCTCACGCATAGGCGCAGTTATGGTTAATGCCAAAACTTTTTTTTCGCGCCGAAATAACATAATGCACAGGCATGGC...

Nature of Pathogenicity Islands Horizontal transfer of foreign DNA

Nature of Pathogenicity Islands Nucleotide frequencies comparisons BaseSequence1Sequence2Total A1, ,600 C 1, ,800 G1, ,700 T1, ,900 Total4,0003,0007,000 Nucleotide Count

Nucleotide frequencies to detect foreign genes 1. Find nucleotide frequencies of native genes 2. Find nucleotide frequencies of test gene 3. Compare frequencies 4. How likely differences arose by chance? Chi-squared analysis

Result: 705 purple 224 white = 929 plantsResult: 698 purple 231 white = 929 plantsResult: 688 purple 241 white = 929 plantsResult: 710 purple 219 white = 929 plantsResult: 695 purple 234 white = 929 plantsResult: 702 purple 227 white = 929 plants Where does  2 come from? A million repetitions of Mendel’s experiment Create a million universes -- purple:white on average = 3:1

200,000 repetitions Where does  2 come from? A million repetitions of Mendel’s experiment

500,000 repetitions Where does  2 come from? A million repetitions of Mendel’s experiment

1,000,000 repetitions Why is it that the two dotted lines are on opposite sides of the mean?

Where does  2 come from? A million repetitions of Mendel’s experiment 1,000,000 repetitions What’s the most likely result? How often does it occur?

Deviation from Expectation Two example experiments Why is there shading on both sides of the curve? The farther away O from E, the smaller/larger the shaded area?

Steps in Performing a Chi 2 Test Determine the expected values for the experiment Model: 3 purple : 1 white flower Total counted: 929 Purple = 75% of 929 = White = 25% of 929 = Calculate the squares of the deviations Chi 2 = Sum of (O - E) 2 / E Chi 2 = ( ) 2 / ( ) 2 / ~8 2 / ~8 2 / 230 ~0.09 ~0.3 Chi 2 = approx 0.39 (actually = 0.37)

Steps in Performing a Chi 2 Test Determine the degrees of freedom What was the experiment? - Count 929 flowers a million times Ask: purple? (if not, then white) Look up probability for  2 value  2 = % > P > 50%. Call it ~60% Therefore ONE degree of freedom

Steps in Performing a Chi 2 Test P ~60% Draw a conclusion The result has a 50% chance of being correctThe hypothesis has a 50% chance of being correct60% of the time, Mendel’s result or worse would have arisen by chance if purple:white truly occurs in a 3:1ratio.

Deviation from Expectation Two example experiments Study Question 20: What if Mendel had counted not 929 but 929,000 plants -- what does the curve and shading look like then? (d still = 29) P =.50P = ???

Interpretation of Chi-Square Does a high P value indicate the hypothesis is correct? Does a low P value indicate the hypothesis is incorrect?

Bag of Marbles 1000’s of marbles! 50% red, 50% blue Guaranteed!

Test Claim of 50%:50% 41 marbles 59 marbles 100 marbles TOTAL Is their claim correct? How to tell how close is close enough?

 2 Test of Claim Chi 2 = Sum of (O - E) 2 / E Chi 2 = ( ) 2 /50 + ( ) 2 /50 9 / / 50 18/ P = ? P = ~60%