Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation ratio  Testing populations: polymorphism, heterogeneity, heterozygosity, allele frequency.

Probability: The Need for Permutations and Combinations  Often, particularly in genetics, the sample space consists of all orders or arrangements of groups of objects (usually genes or alleles in genetics).  Permutations, combinations, and combinations with repetition exist to handle this elegantly.

Probability: Permutation  Definition: A permutation is the number of ways one can order r elements out of n elements. It is often written n P r and is calculated as  Example: How many different types of heterozygotes exist when there are l alleles and we distinguish order (e.g. paternal vs. maternal)?

Probability: Combination  Definition: A combination is the number of ways you can select r objects from n objects without regard to order. It is written as n C r and has value  Example: How many different heterozygotes exist without regard to order when there are l types of alleles?

Probability: Combination with Repetition  Definition: Suppose there are n different types of elements and r are selected with replacement, then the number of combinations is given by C’(n, r) = n+r-1 C r.  Examples: How many genotypes are possible when there are l alleles? How many mating types are possible when there are l alleles?

Review: Segregation Ratio  Recall that the law of segregation states that one of the two alleles of a parent is randomly selected to pass on to the offspring.  Definition: The segregation ratios are the predictable proportions of genotypes and phenotypes in the offspring of particular parental crosses. e.g. 1 AA : 2 AB : 1 BB following a cross of AB X AB.

Segregation Ratio Distorition  Definition: Segregation ratio distortion is a departure from expected segregation ratios. The purpose of segregation analysis is to detect significant segregation ratio distortion. A significant departure would suggest one of our our assumptions about the model wrong.

 Genetic model for a single locus gene: dominant, codominant, truly single locus  Other genetic information: selection-free, completely penetrant.  Data quality: systematic error, non-random sampling. Few important genes are single-locus. Often single locus analysis is used to verify marker systems. Segregation Analysis: What it Teaches Us

Segregation Analysis: Experimental Design  Run a controlled cross with known expected segregation ratios. OR  Sample offspring of particular mating type with known expected segregation ratios.  Verify segregation ratios.

Autosomal Dominant Mating Type GenotypePhenotype DDDdddDominantRecessive DDxDD10010 DDxDd0.5 010 DDxdd01010 DdxDd0.250.50.250.750.25 Ddxdd00.5 ddxdd00101 A B C

Autosomal Dominant: The Data and Hypothesis  Obtain a random sample of matings between affected (Dd) and unaffected (dd) individuals.  Sample n of their offspring and find that r are affected with the disease (i.e. Dd).  H 0 : proportion of affected offspring is 0.5

Autosomal Dominant: Binomial Test  H 0 : p = 0.5  If r  n/2 p-value = 2P(X  r)  If r > n/2 p-value = 2P(X  n-r)  P(X  c) = observe 29 p-value = 0.32

Autosomal Dominant: Standard Normal Test   = np   2 = np(1-p)   Under H 0, X ~ N(n/2,n/4)  observe 29 p-value = 0.26

Autosomal Dominant: Pearson Chi-Square Test  The distribution of the sum of k squares of iid standard normal variables is defined as a chi-square distribution with k degree of freedom.  p-value = 0.26

Continuity Correction  Both the normal and chi-square are continuous distributions, but our data is not.  Continuity correction for Normal: r = 28.5 corrected p-value = 0.32  Continuity correction for Chi-Square: r = 28.5; n-r = 21.5 corrected p-value = 0.32

Autosomal Dominant: Likelihood Ratio Test  Write likelihood:  Calculate the MLE under H A :  Calculate the G statistic:  Determine G distribution:  Calculate p-value = 0.26

Estimating Segregation Ratio: MOM  first moment = np  sample moment = r  MOM: np = r  MOM estimate:

Estimating Segregation Ratio: Likelihood Method  Set score to 0:  Solve for mle:

Estimating Confidence Interval for Segregation Ratio  Our estimate is X/n, where X is the random variable representing the number of “successes” observed and n is the sample size.  E(X/n) = E(X)/n = np/n = p  Var(X/n) = Var(X)/n 2 = np(1-p)/n 2 = p(1-p)/n  SE(X/n) =  Therefore, X/n is unbiased and we can obtain a confidence interval using a normal approximation with SE(X/n).

Estimating Confidence Interval for Segregation Ratio

Segregation Analysis: Codominant Loci I Mating TypeGenotype DDDddd DDxDD100 DDxDd0.5 0 DDxdd010 DdxDd0.250.50.25 Ddxdd00.5 ddxdd001

Segregation Analysis: Codominant Loci II  All 6 mating types are identifiable.  Each mating type can be tested for agreement with expected segregation ratios.  Some mating types result in 3 types of offspring. Must use Chi-Square or likelihood ratio test.

Multiple Populations: Testing for Heterogeneity  Suppose you observe segregation ratios in samples of size n in m populations.  Calculate a total chi-square:  Calculate a pooled chi-square:

Multiple Populations: Testing for Heterogeneity  Then,

Multiple Populations: Testing for Heterogeneity  Alternatively, one may calculate G statistics.  Then, G total –G pooled is also distributed as

Multiple Populations: Example  In Mendel’s F2 cross of smooth and wrinkled inbred pea lines, he sampled 10 plants and counted the number of smooth and wrinkled peas produced by each of those plants.  Is there heterogeneity between plants?  Further tests show that  single gene controls smooth vs. wrinkled  smooth is dominant to wrinkled

Screening Markers for Polymorphism  An important step in designing mapping studies is to find markers that show polymorphism. We are interested in tests for polymorphism.  A false negative would result if the marker was truly polymorphic, but our test showed it to be monomorphic.  A false positive would result if the marker was truly monomorphic, but our test showed it to be polymorphic.

Testing for Polymorphism: Backcross 1:1  You design a backcross experiment to test for polymorphism at a marker of interest. You sample n offspring of the backcross.  P(monomorphic) = 2(0.5) n

Testing for Polymorphism: F2 codominant 1:2:1  You design a F2 cross with a marker that is codominant. You sample n F2 individuals.  P(monomorphic) = 2(0.25) n + (0.5) n

Testing for Polymorphism: F2 dominant marker  You design an F2 cross, but this time observe a dominant marker. You sample n F2 individuals.  P(monomorphic) = (0.75) n + (0.25) n

Power of Test for Polymorphism

Estimating Heterozygosity

Estimating Allele Frequency  It is often assumed that alleles have equal frequencies when there are many alleles at a locus. This assumption can result in false positives for linkage, so it is important to test allele frequencies.  Suppose there are l possible alleles A 1, A 2, …. You observe n ij genotypes A i A j.  You estimate genotypes frequencies

Estimating Allele Frequencies

Probability of Observing an Allele  Suppose there is an allele A i with frequency p i. What is the probability of sampling at least one allele of type A i ? sample size calculation

Probability of Observing Multiple Alleles  Let  i be the probability of observing at least one allele of type i.  There are ways of selecting m different alleles and an associated probability  (j m ) of detecting at least one of each calculated from the  i.  Then we can calculate the probability of observing k or more alleles by summing over these probabilities for k, k+1, …, l.

Approximate Probability of Observing k or More Alleles  The above procedure becomes computationally difficult when there are many alleles and the frequencies are unequal.  There is a Monte Carlo approximation.  Select a random variable I i to be 1 with probability  i and 0 otherwise.  Compute for b bootstrap trials. The proportion of trials with I  k is an estimate of the probability of observing k or more alleles.

Summary  Permutation and combinations: knowing how to count number of genotypes, mating types, etc.  Testing segregation ratios for dominant and codominant loci.  Testing for population heterogeneity.  Screening for polymorphism.  Estimating heterozygosity, probability of observing and allele.

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Similar presentations

Presentation on theme: "Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Similar presentations

Presentation on theme: "Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation."— Presentation transcript:

Similar presentations

About project

Feedback