Grouping loci Criteria Maximum two-point recombination fraction –Example -r ij ≤ 0.40 Minimum LOD score - Z ij –For n loci, there are n(n-1)/2 possible combinations that will be tested –Expect probability of false positives Significant probability value - p ij –Example p ij ≤
Locus ordering Ideally, we would estimate the likelihoods for all possible orders and take the one that is most probable by comparing log likelihoods That is computationally inefficient when there are more than ~10 loci Several methods have been proposed for producing a preliminary order
Locus ordering No. of loci k Possible orders No. of triplets ,814, X , X ,880 Number of orders among k loci Number of triplets among k loci
Three-point Analysis Number of unique orders among k loci OrderMirror Order ABCCBA ACBBCA BACCAB For three loci (k = 3 )
Three-point analysis
Non-Additivity of recombination frequencies A B C r AB r BC r AC The recombination frequency over the interval A – C (r AC ) is less than the sum of r AB and r BC : r AC < r AB + r BC. This is because (rare) double recombination events (a recombination in both A - B and B - C) do not contribute to recombination between A and C.
Non-Additivity of recombination frequencies A B C A B C A B C A B C P 00 =(1-r AB )(1-r BC ) P 10 =r AB (1-r BC ) P 01 =(1-r AB )r BC P 11 =r AB r BC r AC =r AB (1-r BC )+(1-r AB )r BC r AC =r AB +r BC -2r AB r BC
Interference means that recombination events in adjacent intervals interfere. The occurrence of an event in a given interval may reduce or enhance the occurrence of an event in its neighbourhood. Positive interference refers to the ‘suppression’ of recombination events in the neighbourhood of a given one. Negative interference refers to the opposite: enhancement of clusters of recombination events. Positive interference results in less double recombinants (over adjacent intervals) than expected on the basis of independence of recombination events. Interference r AC =r AB +r BC -2Cr AB r BC
Interference C = coefficient of coincidence A BC a bc Interference I = 1 - C Coefficient of coincidence Expected number of double crossovers = r AB r BC N
Observed Count: DH population N=100, locus order ABC
Interference No interference –C = 1 and Interference = 1-C = 0 Complete interference –C = 0 and Interference = 1-C = 1 Negative interference –C > 1 and Interference = 1-C < 0 Positive interference –C 0
Three locus analysis, DH population Expected frequency GenotypesObserved count Without interferenceWith interference ABC/ABCf1f1 r r r r Cr r ) ABc/ABcf2f2 r r r Cr r ) AbC/AbCf3f3 r r Cr r Abc/Abcf4f4 r r r Cr r ) aBC/aBCf5f5 r r r Cr r ) aBc/aBcf6f6 r r Cr r abC/abCf7f7 r r r Cr r ) abc/abcf8f8 r r r r Cr r ) NR DC 12 SC 2 SC 1 For the ABC locus order
MLE of two-locus recombination fractions GenotypesObserved count Expected frequency ABC/ABCf 1 = 34 r r Cr r ) ABc/ABcf 2 = 5 r Cr r ) AbC/AbCf 3 = 11 Cr r Abc/Abcf 4 = 0 r Cr r ) aBC/aBCf 5 = 1 r Cr r ) aBc/aBcf 6 = 10 Cr r abC/abCf 7 = 4 r Cr r ) abc/abcf 8 = 35 r r Cr r ) Regardless of locus order the MLEs of r are For the ABC locus order
Ordering Loci by Minimizing Double Crossovers GenotypesObserved count ABC/ABCf 1 = 34 ABc/ABcf 2 = 5 AbC/AbCf 3 = 11 Abc/Abcf 4 = 0 aBC/aBCf 5 = 1 aBc/aBcf 6 = 10 abC/abCf 7 = 4 abc/abcf 8 = 35 GenotypesObserved count ABC + abcf 1 + f 8 = = 69 ABc + abCf 2 + f 7 = = 9 AbC + aBcf 3 + f 6 = = 21 Abc + aBCf 4 + f 5 = = 1 Rarest genotypes are double recombinants BAC bac XX BaC bAc The order of loci is BAC
Ordering Loci by using recombination fractions MLEs of r are Largest r is r BC = 0.3 Smallest r is r AC = 0.1 B C A C B A C Order
Minimum Sum of Adjacent Recombination Frequencies (SARF) (Falk 1989) OrderSARF ABC = 0.52 BAC = 0.32 ACB = 0.40 r = recombination frequency between adjacent loci ai and aj for a given order: 1, 2, 3, …, l -1, l The B-A-C order gives MIN[SARF] and the minimum distance (MD) map Simulations have shown that SARF is a reliable method to obtain markers orders for large datasets
Minimum Product of Adjacent Recombination Frequencies (PARF) (Wilson 1988) OrderPARF ABC0.22 x 0.30 = BAC0.22 x 0.10 = ACB0.10 x 0.30 = r = recombination frequency between adjacent loci ai and aj for a given order: 1, 2, 3, …, l -1, l The B-A-C order gives MIN[PARF] and the minimum distance (MD) map SARF and PARF are equivalent methods to obtain markers orders for large datasets
Maximum Sum of Adjacent LOD Scores (SALOD) OrderSALOD ABC = BAC = ACB = = LOD score for recombination frequency between adjacent loci a i and a j for a given order: 1, 2, 3, …, l -1, l The B-A-C order gives MAX[SALOD] SALOD is sensitive to locus informativeness
Minimum Count of Crossover Events (COUNT) (Van Os et al. 2005) OrderCOUNT ABC = 52 BAC = 32 ACB = 40 X = simple count of recombination events between adjacent loci a i and a j for a given sequence: 1, 2, 3, …, l -1, l The B-A-C order gives MIN[COUNT] COUNT is equivalent to SARF and PARF with perfect data. COUNT is superior to SARF with incomplete data
Locus Order- Likelihood Approach r = Recombination fraction in interval 1 r = Recombination fraction in interval 2 C = Coefficient of coincidence p i = f i / n f i = Expected frequency of the i th pooled phenotypic class I = 1, 2, …, k k = No. of pooled phenotypic classes
Three locus analysis, DH population Expected frequency GenotypesObserved count Without interferenceWith interference ABC/ABCf1f1 r r r r Cr r ) ABc/ABcf2f2 r r r Cr r ) AbC/AbCf3f3 r r Cr r Abc/Abcf4f4 r r r Cr r ) aBC/aBCf5f5 r r r Cr r ) aBc/aBcf6f6 r r Cr r abC/abCf7f7 r r r Cr r ) abc/abcf8f8 r r r r Cr r ) NR DC 12 SC 2 SC 1 For the ABC locus order
MLE of two-locus recombination fractions GenotypesObserved count Expected frequency ABC/ABCf 1 = 34 r r Cr r ) ABc/ABcf 2 = 5 r Cr r ) AbC/AbCf 3 = 11 Cr r Abc/Abcf 4 = 0 r Cr r ) aBC/aBCf 5 = 1 r Cr r ) aBc/aBcf 6 = 10 Cr r abC/abCf 7 = 4 r Cr r ) abc/abcf 8 = 35 r r Cr r ) Regardless of locus order the MLEs of r are For the ABC locus order
HaplotypesObs. No.Freq. C=3.00Exp. freq.Exp. freq. C=0Exp. freq. C=1 ABC + abcf 1 = r r Cr r = =0.63 ABc + abCf 2 = CrrCrr AbC + aBcf 3 = rCrrrCrr =0.27 Abc + aBCf 4 = rCrrrCrr =0.07 HaplotypesObs. No.Freq. C=3.18Exp. freq.Exp. freq. C=0Exp. freq. C=1 ABC + abcf 1 = r r Cr r = =0.546 ABc + abCf 2 = rCrrrCrr =0.234 AbC + aBcf 3 = CrrCrr Abc + aBCf 4 = rCrrrCrr =0.154 HaplotypesObs. No.Freq. C=0.45Exp. freq.Exp. freq. C=0Exp. freq. C=1 ABC + abcf 1 = r r Cr r = =0.702 ABc + abCf 2 = rCrrrCrr =0.078 AbC + aBcf 3 = rCrrrCrr =0.198 Abc + aBCf 4 = CrrCrr ABC ORDER BAC ORDER ACB ORDER
HaplotypesObs. No.p i, C=3.18p i, C=1 ABC + abcf 1 = ABc + abCf 2 = AbC + aBcf 3 = Abc + aBCf 4 = ABC ORDER
HaplotypesObs. No.p i, C=0.45p i, C=1 ABC + abcf 1 = ABc + abCf 2 = AbC + aBcf 3 = Abc + aBCf 4 = BAC ORDER
HaplotypesObs. No.p i, C=3.00p i, C=1 ABC + abcf 1 = ABc + abCf 2 = AbC + aBcf 3 = Abc + aBCf 4 = ACB ORDER
Likelihood method Unconstrained ModelConstrained Model OrderCLikelihoodLOD Likelihood C=1 LOD C=1 ABC BAC ACB The B-A-C order gives highest likelihood and LOD under a no interference C=1 model Most multipoint ML mapping algorithms use no interference models
Ordering Loci GMENDEL (Liu and Knapp 1990) minimizes SARF (Minimum Sum of Adjacent Recombination Frequencies ) PGRI (Lu and Liu 1995) minimizes SARF (Minimum Sum of Adjacent Recombination Frequencies ) or maximizes the likelihood. RECORD (Van Os et al. 2005) minimizes COUNT (Minimum Count of Crossover Events)
Ordering Loci JoinMap 4 (Van Ooijen, 2005) –minimizes the least square locus order using a stepwise search (regression) –Monte Carlo maximum likelihood (ML). Very fast computation of high density maps