Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature.

Similar presentations


Presentation on theme: "Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature."— Presentation transcript:

1 Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature Test the power and robustness by computer simulation Database construction (Excel, Access) Translate data to analyzable form Preliminary results (figures, tables) Program languages Efficient, feasible Graphics Excel graphics Programmable graphics

2 Program Languages Fortran, C, C++ Matrix language: MATLAB, S-Plus, R, SAS IML Symbolic Calculation: Mathematika,Maple,Matlab Interface Programming: dotnet, C#, Visual Basic SAS, SPSS, BMDP Database: Access, Excel, SQL, SAS, Oracle MACRO –Excel, Access, PowerPoint, Word –Editor: WinEdt –SAS Macro

3 Two Point Analysis in F2 Fully Informative Markers (codominant) BBBbbb AAObsn 22 n 21 n 20 Freq ¼ (1-r) 2 ½ r(1-r) ¼ r 2 Recom.012 AaObs n 12 n 11 n 10 Freq ½ r(1-r) ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.12r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼ r 2 ½ r(1-r) ¼ (1-r) 2 Recom.210

4 EM algorithm to estimate the recombination fraction r: 1.Given r(0), For t=0,1, 2,… 2.Do While abs[r(t+1)-r(t)]>1.e-8  E-step: Calculate  (t) = r(t) 2 /[(1-r(t)) 2 +r(t) 2 ] (expected the number of recombination events for the double heterozygote AaBb)  M-step: r(t+1)= 1/(2n)[2(n 20 +n 02 )+(n 21 +n 12 +n 10 +n 01 )+2  (t)n 11 ]

5 Two Point Analysis in F2 Fully Informative Markers (codominant) AA Aa aa BBBbbb n Input:Result: r0  (t) = r(t) 2 /[(1-r(t)) 2 +r(t) 2 ] r(t+1)= 1/(2n)[2(n20+n02)+(n21+n12+n10+n01)+2  (t)n11]

6 Two Point Analysis in F2 Fully Informative Markers (codominant) function r=rEstF2(n22,n21,n20,n12,n11,n10,n02,n01,n00) n=n22+n21+n20+n12+n11+n10+n02+n01+n00; r=0.2; r1=-1; while (abs(r1-r)>1.e-8) r1=r; %E-step phi=r^2/((1-r)^2+r^2); %M step r=1/(2*n)*(2*(n20+n02)+(n21+n12+n10+n01)+2*phi*n11); end Matlab program to estimate recombinant r

7 Log-likelihood ratio test statistic Two alternative hypotheses H0: r = 0.5 vs. H1: r  0.5 Likelihood value under H1 L 1 (r|n ij ) = n!/(n 22 !...n 00 !)  [ ¼ (1-r) 2 ] n22+n00 [ ¼ r 2 ] n20+n02 [ ½ r(1-r)] n21+n12+n10+n01 [ ½ (1-r) 2 + ½ r 2 ] n11 Likelihood value under H0 L 0 (r=0.5|n ij ) = n!/(n 22 !...n 00 !)  [ ¼ (1-0.5) 2 ] n22+n00 [ ¼ 0.5 2 ] n20+n02 [ ½ 0.5(1-0.5)] n21+n12+n10+n01 [ ½ (1- 0.5) 2 + ½ 0.5 2 ] n11 LOD = log 10 [L 1 (r|n ij )/L 0 (r=0.5|n ij )] = {(n 22 +n 00 )2[log 10 (1-r)-log 10 (1-0.5)+ … } = 6.08 > critical LOD=3

8 Two Point Analysis in F2 Fully Informative Markers (codominant) function LOD=calcLOD_F2(r,n22,n21,n20,n12,n11,n10,n02,n01,n00) %log likelihood under H1 LOD=(n22+n00)*log10((1-r)^2/4)... +(n20+n02)*log10(r^2/4)... +(n21+n12+n10+n01)*log10(r*(1-r)/2)... +n11*log10((1-r)^2/2+r^2/2); %log likelihood under H0 r=0.5; LOD0=(n22+n00)*log10((1-r)^2/4)... +(n20+n02)*log10(r^2/4)... +(n21+n12+n10+n01)*log10(r*(1-r)/2)... +n11*log10((1-r)^2/2+r^2/2); LOD=LOD-LOD0; Matlab program to calculate log likelihood test score (LOD)

9 Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) BBBbbb AAObsn 22 n 21 n 20 Freq ¼ (1-r) 2 ½ r(1-r) ¼ r 2 Recom.012 AaObs n 12 n 11 n 10 Freq ½ r(1-r) ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.12r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼ r 2 ½ r(1-r) ¼ (1-r) 2 Recom.210

10 Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) B_bb AAObs n 2_ =n 22 +n 21 n 20 Freq ¼ (1-r) 2 + ½ r(1-r) ¼ r 2 Recom.C 1 = ½ r(1-r)/[ ¼ (1-r) 2 + ½ r(1-r)]2 AaObs n 1_ =n 12 +n 11 n 10 Freq ½ r(1-r)+ ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.C 2 =[ ½ r(1-r) +r 2 ]/ [ ½ r(1-r)+ ½ (1-r) 2 + ½ r 2 ] 1 aaObs n 0_ =n 02 +n 01 n 00 Freq ¼ r 2 + ½ r(1-r) ¼ (1-r) 2 Recom.C 3 =[2* ¼ r 2 + ½ r(1-r)]/[ ¼ r 2 + ½ r(1-r)]0 Estimate of r=(c1* n 2_ +c2* n 1_ +c3* n 0_ +2* n 20 + n 00 )/(2n)

11 Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) E-Step C 1 = ½ r(1-r)/[ ¼ (1-r) 2 + ½ r(1-r)] C 2 =[ ½ r(1-r) +r 2 ]/ [ ½ r(1-r)+ ½ (1-r) 2 + ½ r 2 ] C 3 =[2* ¼ r 2 + ½ r(1-r)]/[ ¼ r 2 + ½ r(1-r)] M-Step r=(c1* n 2_ +c2* n 1_ +c3* n 0_ +2* n 20 + n 00 )/(2n)

12 Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) AA Aa aa B_bb n Input:Result: r0

13 Two Point Analysis in F2 Partial Informative Markers (co dominant X dominant) function r=rEstF2CoXdomin(n2_,n1_,n0_,n20,n10,n00) n=n2_+n1_+n0_+n20+n10+n00; r=0.2;r1=-1; while(abs(r1-r)>1.e-8) r1=r; %E-step c1= 1/2*r*(1-r)/[1/4*(1-r)^2+ 1/2*r*(1-r)]; c2=[1/2*r*(1-r)+r^2]/[1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2]; c3=[2*1/4*r^2+1/2*r*(1-r)]/[1/4*r^2+1/2*r*(1-r)]; %M-step r=(c1*n2_+c2* n1_ +c3* n0_+2* n20 + n00)/(2*n); end Matlab program to estimate recombinant r

14 Two Point Analysis in F2 Partial Informative Markers (co dominant X dominant) Matlab program to calculate log likelihood test score (LOD) function LOD=calcLOD_F2CoXdomin(r, n2_,n1_,n0_,n20,n10,n00) %log likelihood under H1 LOD=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_... +log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_... +log([1/4*r^2+1/2*r*(1-r)])*n0_... +log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00; %log likelihood under H0 r=0.5; LOD0=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_... +log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_... +log([1/4*r^2+1/2*r*(1-r)])*n0_... +log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00; LOD=LOD-LOD0; LOD=LOD/log(10);

15 Two Point Analysis in F2 Partial Informative Markers (dominant) BBBbbb AAObsn 22 n 21 n 20 Freq ¼ (1-r) 2 ½ r(1-r) ¼ r 2 Recom.012 AaObs n 12 n 11 n 10 Freq ½ r(1-r) ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.12r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼ r 2 ½ r(1-r) ¼ (1-r) 2 Recom.210

16 Two Point Analysis in F2 Partial Informative Markers (dominant) B_bb A_Obs n 1 =n 22 +n 21 +n 12 + n 11 n 2 =n 20 +n 10 Freq ¼ (1-r) 2 +r(1-r) + ½ (1-r) 2 + ½ r 2 ¼ r 2 Recom.c1c2 aaObs n 3 =n 02 +n 01 n 4 = n 00 Freq ¼ r 2 + ½ r(1-r) ¼ (1-r) 2 Recom.C2= (2( ¼ r 2 )+ ½ r(1-r)) 0 /( ¼ r 2 + ½ r(1-r)) where C1=[r 2 +r(1-r)]/[ ¼(1-r) 2 +r(1-r) + ½(1-r) 2 +½r 2 ], expected number of recombinant gametes Estimate of r=(c1* n 1 +c2* n 2 +c2* n 3 )/(2n)

17 Two Point Analysis in F2 Fully Informative Markers (codominant) A_ aa B_bb n Input:Result: r0 C1=[r 2 +r(1-r)]/[ ¼(1-r) 2 +r(1-r) + ½(1-r) 2 +½r 2 ], C2= (2( ¼ r 2 )+ ½ r(1-r)) /( ¼ r 2 + ½ r(1-r)) Estimate of r=(c1* n 1 +c2* n 2 +c2* n 3 )/(2n)

18 Two Point Analysis in F2 Partial Informative Markers (dominant) function r=rEstF2Partial(n1,n2,n3,n4) n=n1+n2+n3+n4; r=0.2;r1=-1; while (abs(r1-r)>1.e-8) r1=r; %E-step c1=(r^2+r*(1-r))/((1-r)^2/4+r*(1-r)+(1-r)^2/2+r^2/2); c2=(r^2/2+r*(1-r)/2)/(r^2/4+r*(1-r)/2); %M-step r=1/(2*n)*(c1*n1+c2*n2+c2*n3); end Matlab program to estimate recombinant r

19 Log-likelihood ratio test statistic Partial Informative Markers (dominant) Two alternative hypotheses H0: r = 0.5 vs. H1: r  0.5 Likelihood value under H1 L 1 (r|n ij ) = n!/(n 1 !...n 4 !)  [3/4(1-r) 2 +r(1-r) + ½ r 2 ] n1 [ ¼ r 2 + ½ r(1-r)] n2+n3 [ ¼ (1-r) 2 ] n4 Likelihood value under H0 L 0 (r=0.5|n ij ) = n!/(n 1 !...n 4 !)  [3/4(1-.5) 2 +.5(1-.5) + ½.5 2 ] n1 [ ¼.5 2 + ½.5(1-.5)] n2+n3 [ ¼ (1-.5) 2 ] n4 LOD = log 10 [L 1 (r|n ij )/L 0 (r=0.5|n ij )] = 3.17 > critical LOD=3

20 Two Point Analysis in F2 Partial Informative Markers (dominant) function LOD=calcLOD_F2Partial(r,n1,n2,n3,n4) %log likelihood under H1 LOD=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))... +(n2+n3)*log10(r^2/4+r*(1-r)/2)... +(n4)*log10((1-r)^2/4); %log likelihood under H0 r=0.5; LOD0=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))... +(n2+n3)*log10(r^2/4+r*(1-r)/2)... +(n4)*log10((1-r)^2/4); LOD=LOD-LOD0; Matlab program to calculate log likelihood test score (LOD)

21 Three Point Analysis in Backcross a rice data

22 RG472 RG246 19.2 16.1 K5 U10 RG532 W1 RG173 RZ276 Amy1B RG146 RG345 RG381 RZ19 RG690 RZ730 RZ801 RG810 RG331 4.8 4.7 15.3 15.5 15.0 3.8 3.3 34.3 2.5 23.5 8.2 13.2 33.1 2.6 9.2 RG437 RG544 RG171 RG157 RZ318 Pall RZ58 CDO686 Amy1A/C RG95 RG654 RG256 RZ213 RZ123 RG520 13.0 5.3 22.2 27.4 6.3 29.3 10.2 8.8 12.8 8.4 5.1 10.0 5.4 13.1 RG104 RG348 RZ329 RZ892 RG100 RG191 RZ678 RZ574 RZ284 RZ394 pRD10A RZ403 RG179 CDO337 RZ337A RZ448 RZ519 Pgi -1 CDO87 RG910 RG418A 7.7 13.2 6.9 9.8 2.8 17.5 41.6 37.1 15.6 18.5 2.5 5.0 28.6 1.9 22.5 15.0 32.1 7.1 9.2 17.9 RG218 RZ262 RG190 RG908 RG91 RG449 RG788 RZ565 RZ675 RG163 RZ590 RG214 RG143 RG620 8.1 8.6 12.6 13.7 3.2 16.1 8.4 16.8 21.4 28.2 2.7 12.2 5.9 chrom1chrom2chrom3chrom4

23 Three Point Analysis in Backcross Summarized the data as A,B,C Obs.A & BB & C 111abcn abc 00 112abCn abC 01 121aBcn aBc 11 122aBCn aBC 10 211Abcn Abc 10 212AbCn AbC 11 221ABcn ABc 01 222ABCn ABC 00

24 Rice Data A,B,C Obs.A & BB & C 111abcn abc =3100 112abCn abC =1001 121aBcn aBc = 111 122aBCn aBC =1110 211Abcn Abc = 510 212AbCn AbC = 211 221ABcn ABc = 201 222ABCn ABC =3800 Marker RG472 denoted by A, RG246 by B, K5 by C

25 Multilocus likelihood – determination of a most likely gene order Consider three markers A, B, C, with no particular order assumed. A triply heterozygous F1 ABC/abc backcrossed to a pure parent abc/abc GenotypeABC or abc ABc or abC Abc or aBC AbC or aBc Obs. n 00 =69 n 01 =12 n 10 =16 n 11 =3 Frequency under Order A-B-C (1-r AB )(1- r BC ) (1-r AB ) r BC r AB (1- r BC ) r AB r BC Order A-C-B (1-r AC )(1- r BC ) r AC r BC r AC (1-r BC ) (1-r AC )r BC Order B-A-C (1-r AB )(1- r AC ) (1-r AB ) r AC r AB r AC r AB (1-r AC ) r AB = the recombination fraction between A and B= (n 10 + n 11 )/n=0.19 r BC = the recombination fraction between B and C= (n 01 + n 11 )/n=0.15 r AC = the recombination fraction between A and C= (n 01 + n 10 )/n=0.28

26 What order is the mostly likely? L ABC  (1-r AB ) n00+n01 (1-r BC ) n00+n10 (r AB ) n10+n11 (r BC ) n01+n11 L ACB  (1-r AC ) n00+n11 (1-r BC ) n00+n10 (r AC ) n01+n10 (r BC ) n01+n11 L BAC  (1-r AB ) n00+n01 (1-r AC ) n00+n11 (r AB ) n10+n11 (r AC ) n01+n10 Log(LABC) = -90.8932 Loo(LACB) = -101.5662 Log(LBAC) = -107.9176 According to the maximum likelihood principle, the linkage order that gives the maximum likelihood for a data set is the best linkage order supported by the data. the best linkage order A B C 20cM 15cM

27 GenotypeABC or abc ABc or abC Abc or aBC AbC or aBc Obs. n 00 =69 n 01 =12 n 10 =16 n 11 =3 DATA Result: r AB = =0.19 r BC = =0.15 r AC = =0.28 d AB =1/4*ln[(1+2 r AB )/(1-2 r AB )]=20 d BC =1/4*ln[(1+2 r BC )/(1-2 r BC )]=15 Log(LABC) = -90.8932 Loo(LACB) = -101.5662 Log(LBAC) = -107.9176 the best linkage order A B C 20cM 15cM


Download ppt "Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature."

Similar presentations


Ads by Google