Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Planning breeding programs for impact
Lab 3 : Exact tests and Measuring Genetic Variation.
Lecture 3: Jan. 25 Transmission genetics: independent assortment Human pedigrees.
Inheritance and Probability
Basics of Linkage Analysis
What is a χ2 (Chi-square) test used for?
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Transmission Genetics: Heritage from Mendel 2. Mendel’s Genetics Experimental tool: garden pea Outcome of genetic cross is independent of whether the.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
DATA ANALYSIS Module Code: CA660 Lecture Block 2.
Sample size computations Petter Mostad
DATA ANALYSIS Module Code: CA660 Lecture Block 7.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inferences About Process Quality
Brachydactyly and evolutionary change
5-3 Inference on the Means of Two Populations, Variances Unknown
Quantitative Genetics
1 Categorical Data (Chapter 10) Inference about one population proportion (§10.2). Inference about two population proportions (§10.3). Chi-square goodness-of-fit.
Chi-Squared Test.
Hypothesis testing. Want to know something about a population Take a sample from that population Measure the sample What would you expect the sample to.
Chi Square AP Biology.
STATISTICAL INFERENCE PART VII
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
HARDY-WEINBERG EQUILIBRIUM
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Population Genetics is the study of the genetic
Lecture 5a: Bayes’ Rule Class web site: DEA in Bioinformatics: Statistics Module Box 1Box 2Box 3.
Mapping populations Controlled crosses between two parents –two alleles/locus, gene frequencies = 0.5 –gametic phase disequilibrium is due to linkage,
1 MAXIMUM LIKELIHOOD ESTIMATION Recall general discussion on Estimation, definition of Likelihood function for a vector of parameters  and set of values.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Chapter 3 – Basic Principles of Heredity. Johann Gregor Mendel (1822 – 1884) Pisum sativum Rapid growth; lots of offspring Self fertilize with a single.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Allele Frequencies: Staying Constant Chapter 14. What is Allele Frequency? How frequent any allele is in a given population: –Within one race –Within.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
Types of biological variation Discontinuous (qualitative) variation: simple alternative forms; alternative phenotypes; usually due to alternative genotypes.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Lecture 11. The chi-square test for goodness of fit.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Hardy-Weinberg Equilibrium When mating is completely random, the zygotic frequencies expected in the next generation may be predicted from the knowledge.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
AP Biology Heredity PowerPoint presentation text copied directly from NJCTL with corrections made as needed. Graphics may have been substituted with a.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
CONCEPTS OF ESTIMATION
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
Lecture 9: QTL Mapping II: Outbred Populations
Linkage Analysis Problems
20 May 2019 Chi2 Test For Genetics Help sheet.
Presentation transcript:

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation ratio  Testing populations: polymorphism, heterogeneity, heterozygosity, allele frequency.

Probability: The Need for Permutations and Combinations  Often, particularly in genetics, the sample space consists of all orders or arrangements of groups of objects (usually genes or alleles in genetics).  Permutations, combinations, and combinations with repetition exist to handle this elegantly.

Probability: Permutation  Definition: A permutation is the number of ways one can order r elements out of n elements. It is often written n P r and is calculated as  Example: How many different types of heterozygotes exist when there are l alleles and we distinguish order (e.g. paternal vs. maternal)?

Probability: Combination  Definition: A combination is the number of ways you can select r objects from n objects without regard to order. It is written as n C r and has value  Example: How many different heterozygotes exist without regard to order when there are l types of alleles?

Probability: Combination with Repetition  Definition: Suppose there are n different types of elements and r are selected with replacement, then the number of combinations is given by C’(n, r) = n+r-1 C r.  Examples: How many genotypes are possible when there are l alleles? How many mating types are possible when there are l alleles?

Review: Segregation Ratio  Recall that the law of segregation states that one of the two alleles of a parent is randomly selected to pass on to the offspring.  Definition: The segregation ratios are the predictable proportions of genotypes and phenotypes in the offspring of particular parental crosses. e.g. 1 AA : 2 AB : 1 BB following a cross of AB X AB.

Segregation Ratio Distorition  Definition: Segregation ratio distortion is a departure from expected segregation ratios. The purpose of segregation analysis is to detect significant segregation ratio distortion. A significant departure would suggest one of our our assumptions about the model wrong.

 Genetic model for a single locus gene: dominant, codominant, truly single locus  Other genetic information: selection-free, completely penetrant.  Data quality: systematic error, non-random sampling. Few important genes are single-locus. Often single locus analysis is used to verify marker systems. Segregation Analysis: What it Teaches Us

Segregation Analysis: Experimental Design  Run a controlled cross with known expected segregation ratios. OR  Sample offspring of particular mating type with known expected segregation ratios.  Verify segregation ratios.

Autosomal Dominant Mating Type GenotypePhenotype DDDdddDominantRecessive DDxDD10010 DDxDd DDxdd01010 DdxDd Ddxdd00.5 ddxdd00101 A B C

Autosomal Dominant: The Data and Hypothesis  Obtain a random sample of matings between affected (Dd) and unaffected (dd) individuals.  Sample n of their offspring and find that r are affected with the disease (i.e. Dd).  H 0 : proportion of affected offspring is 0.5

Autosomal Dominant: Binomial Test  H 0 : p = 0.5  If r  n/2 p-value = 2P(X  r)  If r > n/2 p-value = 2P(X  n-r)  P(X  c) = observe 29 p-value = 0.32

Autosomal Dominant: Standard Normal Test   = np   2 = np(1-p)   Under H 0, X ~ N(n/2,n/4)  observe 29 p-value = 0.26

Autosomal Dominant: Pearson Chi-Square Test  The distribution of the sum of k squares of iid standard normal variables is defined as a chi-square distribution with k degree of freedom.  p-value = 0.26

Continuity Correction  Both the normal and chi-square are continuous distributions, but our data is not.  Continuity correction for Normal: r = 28.5 corrected p-value = 0.32  Continuity correction for Chi-Square: r = 28.5; n-r = 21.5 corrected p-value = 0.32

Autosomal Dominant: Likelihood Ratio Test  Write likelihood:  Calculate the MLE under H A :  Calculate the G statistic:  Determine G distribution:  Calculate p-value = 0.26

Estimating Segregation Ratio: MOM  first moment = np  sample moment = r  MOM: np = r  MOM estimate:

Estimating Segregation Ratio: Likelihood Method  Set score to 0:  Solve for mle:

Estimating Confidence Interval for Segregation Ratio  Our estimate is X/n, where X is the random variable representing the number of “successes” observed and n is the sample size.  E(X/n) = E(X)/n = np/n = p  Var(X/n) = Var(X)/n 2 = np(1-p)/n 2 = p(1-p)/n  SE(X/n) =  Therefore, X/n is unbiased and we can obtain a confidence interval using a normal approximation with SE(X/n).

Estimating Confidence Interval for Segregation Ratio

Segregation Analysis: Codominant Loci I Mating TypeGenotype DDDddd DDxDD100 DDxDd0.5 0 DDxdd010 DdxDd Ddxdd00.5 ddxdd001

Segregation Analysis: Codominant Loci II  All 6 mating types are identifiable.  Each mating type can be tested for agreement with expected segregation ratios.  Some mating types result in 3 types of offspring. Must use Chi-Square or likelihood ratio test.

Multiple Populations: Testing for Heterogeneity  Suppose you observe segregation ratios in samples of size n in m populations.  Calculate a total chi-square:  Calculate a pooled chi-square:

Multiple Populations: Testing for Heterogeneity  Then,

Multiple Populations: Testing for Heterogeneity  Alternatively, one may calculate G statistics.  Then, G total –G pooled is also distributed as

Multiple Populations: Example  In Mendel’s F2 cross of smooth and wrinkled inbred pea lines, he sampled 10 plants and counted the number of smooth and wrinkled peas produced by each of those plants.  Is there heterogeneity between plants?  Further tests show that  single gene controls smooth vs. wrinkled  smooth is dominant to wrinkled

Screening Markers for Polymorphism  An important step in designing mapping studies is to find markers that show polymorphism. We are interested in tests for polymorphism.  A false negative would result if the marker was truly polymorphic, but our test showed it to be monomorphic.  A false positive would result if the marker was truly monomorphic, but our test showed it to be polymorphic.

Testing for Polymorphism: Backcross 1:1  You design a backcross experiment to test for polymorphism at a marker of interest. You sample n offspring of the backcross.  P(monomorphic) = 2(0.5) n

Testing for Polymorphism: F2 codominant 1:2:1  You design a F2 cross with a marker that is codominant. You sample n F2 individuals.  P(monomorphic) = 2(0.25) n + (0.5) n

Testing for Polymorphism: F2 dominant marker  You design an F2 cross, but this time observe a dominant marker. You sample n F2 individuals.  P(monomorphic) = (0.75) n + (0.25) n

Power of Test for Polymorphism

Estimating Heterozygosity

Estimating Allele Frequency  It is often assumed that alleles have equal frequencies when there are many alleles at a locus. This assumption can result in false positives for linkage, so it is important to test allele frequencies.  Suppose there are l possible alleles A 1, A 2, …. You observe n ij genotypes A i A j.  You estimate genotypes frequencies

Estimating Allele Frequencies

Probability of Observing an Allele  Suppose there is an allele A i with frequency p i. What is the probability of sampling at least one allele of type A i ? sample size calculation

Probability of Observing Multiple Alleles  Let  i be the probability of observing at least one allele of type i.  There are ways of selecting m different alleles and an associated probability  (j m ) of detecting at least one of each calculated from the  i.  Then we can calculate the probability of observing k or more alleles by summing over these probabilities for k, k+1, …, l.

Approximate Probability of Observing k or More Alleles  The above procedure becomes computationally difficult when there are many alleles and the frequencies are unequal.  There is a Monte Carlo approximation.  Select a random variable I i to be 1 with probability  i and 0 otherwise.  Compute for b bootstrap trials. The proportion of trials with I  k is an estimate of the probability of observing k or more alleles.

Summary  Permutation and combinations: knowing how to count number of genotypes, mating types, etc.  Testing segregation ratios for dominant and codominant loci.  Testing for population heterogeneity.  Screening for polymorphism.  Estimating heterozygosity, probability of observing and allele.