Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature.

Slides:



Advertisements
Similar presentations
Mapping genes with LOD score method
Advertisements

Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.
Genetics I. I. Mendelian 1. History A. Introduction.
1 BBS- 6. INTRODUCTION METHODS OF HOMOZYGOSITY MAPPING HOMOZYGOSITY MAPPER GENETIC LINKAGE LOD SCORE METHOD 2.
Basics of Linkage Analysis
Chapter 9: Genetic linkage and maps in breeding applications
Symbols to Know for Crosses a/a – a is the allele and / represents the two chromatids – there are two alleles for a diploid organism a b/a b – two different.
Chapter 12 – Patterns of Inheritance What is inheritance? Why study inheritance? What is the relationship between genes, alleles, phenotype and genotype?
Joint Linkage and Linkage Disequilibrium Mapping
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Creating a heterozygous male Hermaphrodite Parent genotype Male Parent genotype X Male gametes Female gametes Progeny genotype Which progeny are you continuing.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
. Learning Bayesian networks Slides by Nir Friedman.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
Mapping Basics MUPGRET Workshop June 18, Randomly Intermated P1 x P2  F1  SELF F …… One seed from each used for next generation.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Textbook. Textbook Grading 30% Homework (one per two weeks) 70% Research project - Class presentation (20%) - Written report (50%)
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Chi-square Statistical Analysis You and a sibling flip a coin to see who has to take out the trash. Your sibling grows skeptical of the legitimacy of.
Mapping populations Controlled crosses between two parents –two alleles/locus, gene frequencies = 0.5 –gametic phase disequilibrium is due to linkage,
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
Welcome to the Presentation
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Quantitative Genetics
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Grouping loci Criteria Maximum two-point recombination fraction –Example -r ij ≤ 0.40 Minimum LOD score - Z ij –For n loci, there are n(n-1)/2 possible.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Sir Archibald E Garrod – alcaptonuria – black urine - (Madness of King George)
Types of biological variation Discontinuous (qualitative) variation: simple alternative forms; alternative phenotypes; usually due to alternative genotypes.
QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype.
Genetic Crosses – single gene. Genotype and PhenotypeGenotype and Phenotype –Genotype is the genetic makeup of the organism. –Phenotype is the physical.
Interval mapping with maximum likelihood Data Files: Marker file – all markers Traits file – all traits Linkage map – built based on markers For example:
Statistical Genetics Instructor: Rongling Wu.
Determine the sequence of genes along a chromosome based on the following recombination frequencies A-C 20% A-D 10% B-C 15% B-D 5%
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
- Type of Study Composite Interval Mapping Program - Genetic Design.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Step one Two gene loci: A & B What will your first cross be in an experiment to test for possible meiotic crossing over? Hint: what condition do you have.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
I. Allelic, Genic, and Environmental Interactions
Power in QTL linkage analysis
I. Allelic, Genic, and Environmental Interactions
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
تصنيف التفاعلات الكيميائية
Gene Linkage and Genetic Mapping
Gene Linkage and Genetic Mapping
Genetics.
Gene Linkage and Genetic Mapping
Harald H.H. Göring, Joseph D. Terwilliger 
Error Checking for Linkage Analyses
Robustness and Power of the Maximum-Likelihood–Binomial and Maximum-Likelihood– Score Methods, in Multipoint Linkage Analysis of Affected-Sibship Data 
Heredity, Gene Regulation, and Development
Creating a heterozygous male
Lecture 9: QTL Mapping II: Outbred Populations
Linkage Analysis Problems
Heredity, Gene Regulation, and Development
Genetic linkage analysis
Symbols to Know for Crosses
Genetics.
Presentation transcript:

Computational Issues on Statistical Genetics Develop Methods Data Collection Analyze Data Write Reports/Papers Research Questions Review the Literature Test the power and robustness by computer simulation Database construction (Excel, Access) Translate data to analyzable form Preliminary results (figures, tables) Program languages Efficient, feasible Graphics Excel graphics Programmable graphics

Program Languages Fortran, C, C++ Matrix language: MATLAB, S-Plus, R, SAS IML Symbolic Calculation: Mathematika,Maple,Matlab Interface Programming: dotnet, C#, Visual Basic SAS, SPSS, BMDP Database: Access, Excel, SQL, SAS, Oracle MACRO –Excel, Access, PowerPoint, Word –Editor: WinEdt –SAS Macro

Two Point Analysis in F2 Fully Informative Markers (codominant) BBBbbb AAObsn 22 n 21 n 20 Freq ¼ (1-r) 2 ½ r(1-r) ¼ r 2 Recom.012 AaObs n 12 n 11 n 10 Freq ½ r(1-r) ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.12r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼ r 2 ½ r(1-r) ¼ (1-r) 2 Recom.210

EM algorithm to estimate the recombination fraction r: 1.Given r(0), For t=0,1, 2,… 2.Do While abs[r(t+1)-r(t)]>1.e-8  E-step: Calculate  (t) = r(t) 2 /[(1-r(t)) 2 +r(t) 2 ] (expected the number of recombination events for the double heterozygote AaBb)  M-step: r(t+1)= 1/(2n)[2(n 20 +n 02 )+(n 21 +n 12 +n 10 +n 01 )+2  (t)n 11 ]

Two Point Analysis in F2 Fully Informative Markers (codominant) AA Aa aa BBBbbb n Input:Result: r0  (t) = r(t) 2 /[(1-r(t)) 2 +r(t) 2 ] r(t+1)= 1/(2n)[2(n20+n02)+(n21+n12+n10+n01)+2  (t)n11]

Two Point Analysis in F2 Fully Informative Markers (codominant) function r=rEstF2(n22,n21,n20,n12,n11,n10,n02,n01,n00) n=n22+n21+n20+n12+n11+n10+n02+n01+n00; r=0.2; r1=-1; while (abs(r1-r)>1.e-8) r1=r; %E-step phi=r^2/((1-r)^2+r^2); %M step r=1/(2*n)*(2*(n20+n02)+(n21+n12+n10+n01)+2*phi*n11); end Matlab program to estimate recombinant r

Log-likelihood ratio test statistic Two alternative hypotheses H0: r = 0.5 vs. H1: r  0.5 Likelihood value under H1 L 1 (r|n ij ) = n!/(n 22 !...n 00 !)  [ ¼ (1-r) 2 ] n22+n00 [ ¼ r 2 ] n20+n02 [ ½ r(1-r)] n21+n12+n10+n01 [ ½ (1-r) 2 + ½ r 2 ] n11 Likelihood value under H0 L 0 (r=0.5|n ij ) = n!/(n 22 !...n 00 !)  [ ¼ (1-0.5) 2 ] n22+n00 [ ¼ ] n20+n02 [ ½ 0.5(1-0.5)] n21+n12+n10+n01 [ ½ (1- 0.5) 2 + ½ ] n11 LOD = log 10 [L 1 (r|n ij )/L 0 (r=0.5|n ij )] = {(n 22 +n 00 )2[log 10 (1-r)-log 10 (1-0.5)+ … } = 6.08 > critical LOD=3

Two Point Analysis in F2 Fully Informative Markers (codominant) function LOD=calcLOD_F2(r,n22,n21,n20,n12,n11,n10,n02,n01,n00) %log likelihood under H1 LOD=(n22+n00)*log10((1-r)^2/4)... +(n20+n02)*log10(r^2/4)... +(n21+n12+n10+n01)*log10(r*(1-r)/2)... +n11*log10((1-r)^2/2+r^2/2); %log likelihood under H0 r=0.5; LOD0=(n22+n00)*log10((1-r)^2/4)... +(n20+n02)*log10(r^2/4)... +(n21+n12+n10+n01)*log10(r*(1-r)/2)... +n11*log10((1-r)^2/2+r^2/2); LOD=LOD-LOD0; Matlab program to calculate log likelihood test score (LOD)

Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) BBBbbb AAObsn 22 n 21 n 20 Freq ¼ (1-r) 2 ½ r(1-r) ¼ r 2 Recom.012 AaObs n 12 n 11 n 10 Freq ½ r(1-r) ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.12r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼ r 2 ½ r(1-r) ¼ (1-r) 2 Recom.210

Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) B_bb AAObs n 2_ =n 22 +n 21 n 20 Freq ¼ (1-r) 2 + ½ r(1-r) ¼ r 2 Recom.C 1 = ½ r(1-r)/[ ¼ (1-r) 2 + ½ r(1-r)]2 AaObs n 1_ =n 12 +n 11 n 10 Freq ½ r(1-r)+ ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.C 2 =[ ½ r(1-r) +r 2 ]/ [ ½ r(1-r)+ ½ (1-r) 2 + ½ r 2 ] 1 aaObs n 0_ =n 02 +n 01 n 00 Freq ¼ r 2 + ½ r(1-r) ¼ (1-r) 2 Recom.C 3 =[2* ¼ r 2 + ½ r(1-r)]/[ ¼ r 2 + ½ r(1-r)]0 Estimate of r=(c1* n 2_ +c2* n 1_ +c3* n 0_ +2* n 20 + n 00 )/(2n)

Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) E-Step C 1 = ½ r(1-r)/[ ¼ (1-r) 2 + ½ r(1-r)] C 2 =[ ½ r(1-r) +r 2 ]/ [ ½ r(1-r)+ ½ (1-r) 2 + ½ r 2 ] C 3 =[2* ¼ r 2 + ½ r(1-r)]/[ ¼ r 2 + ½ r(1-r)] M-Step r=(c1* n 2_ +c2* n 1_ +c3* n 0_ +2* n 20 + n 00 )/(2n)

Two Point Analysis in F2 Partial Informative Markers (codominant X dominant) AA Aa aa B_bb n Input:Result: r0

Two Point Analysis in F2 Partial Informative Markers (co dominant X dominant) function r=rEstF2CoXdomin(n2_,n1_,n0_,n20,n10,n00) n=n2_+n1_+n0_+n20+n10+n00; r=0.2;r1=-1; while(abs(r1-r)>1.e-8) r1=r; %E-step c1= 1/2*r*(1-r)/[1/4*(1-r)^2+ 1/2*r*(1-r)]; c2=[1/2*r*(1-r)+r^2]/[1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2]; c3=[2*1/4*r^2+1/2*r*(1-r)]/[1/4*r^2+1/2*r*(1-r)]; %M-step r=(c1*n2_+c2* n1_ +c3* n0_+2* n20 + n00)/(2*n); end Matlab program to estimate recombinant r

Two Point Analysis in F2 Partial Informative Markers (co dominant X dominant) Matlab program to calculate log likelihood test score (LOD) function LOD=calcLOD_F2CoXdomin(r, n2_,n1_,n0_,n20,n10,n00) %log likelihood under H1 LOD=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_... +log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_... +log([1/4*r^2+1/2*r*(1-r)])*n0_... +log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00; %log likelihood under H0 r=0.5; LOD0=log([1/4*(1-r)^2+ 1/2*r*(1-r)])*n2_... +log([1/2*r*(1-r)+1/2*(1-r)^2+1/2*r^2])*n1_... +log([1/4*r^2+1/2*r*(1-r)])*n0_... +log(r^2/4)*n20+log(r*(1-r)/2)*n10+log((1-r)^2/4)*n00; LOD=LOD-LOD0; LOD=LOD/log(10);

Two Point Analysis in F2 Partial Informative Markers (dominant) BBBbbb AAObsn 22 n 21 n 20 Freq ¼ (1-r) 2 ½ r(1-r) ¼ r 2 Recom.012 AaObs n 12 n 11 n 10 Freq ½ r(1-r) ½ (1-r) 2 + ½ r 2 ½ r(1-r) Recom.12r 2 /[(1-r) 2 +r 2 ]1 aaObs n 02 n 01 n 00 Freq ¼ r 2 ½ r(1-r) ¼ (1-r) 2 Recom.210

Two Point Analysis in F2 Partial Informative Markers (dominant) B_bb A_Obs n 1 =n 22 +n 21 +n 12 + n 11 n 2 =n 20 +n 10 Freq ¼ (1-r) 2 +r(1-r) + ½ (1-r) 2 + ½ r 2 ¼ r 2 Recom.c1c2 aaObs n 3 =n 02 +n 01 n 4 = n 00 Freq ¼ r 2 + ½ r(1-r) ¼ (1-r) 2 Recom.C2= (2( ¼ r 2 )+ ½ r(1-r)) 0 /( ¼ r 2 + ½ r(1-r)) where C1=[r 2 +r(1-r)]/[ ¼(1-r) 2 +r(1-r) + ½(1-r) 2 +½r 2 ], expected number of recombinant gametes Estimate of r=(c1* n 1 +c2* n 2 +c2* n 3 )/(2n)

Two Point Analysis in F2 Fully Informative Markers (codominant) A_ aa B_bb n Input:Result: r0 C1=[r 2 +r(1-r)]/[ ¼(1-r) 2 +r(1-r) + ½(1-r) 2 +½r 2 ], C2= (2( ¼ r 2 )+ ½ r(1-r)) /( ¼ r 2 + ½ r(1-r)) Estimate of r=(c1* n 1 +c2* n 2 +c2* n 3 )/(2n)

Two Point Analysis in F2 Partial Informative Markers (dominant) function r=rEstF2Partial(n1,n2,n3,n4) n=n1+n2+n3+n4; r=0.2;r1=-1; while (abs(r1-r)>1.e-8) r1=r; %E-step c1=(r^2+r*(1-r))/((1-r)^2/4+r*(1-r)+(1-r)^2/2+r^2/2); c2=(r^2/2+r*(1-r)/2)/(r^2/4+r*(1-r)/2); %M-step r=1/(2*n)*(c1*n1+c2*n2+c2*n3); end Matlab program to estimate recombinant r

Log-likelihood ratio test statistic Partial Informative Markers (dominant) Two alternative hypotheses H0: r = 0.5 vs. H1: r  0.5 Likelihood value under H1 L 1 (r|n ij ) = n!/(n 1 !...n 4 !)  [3/4(1-r) 2 +r(1-r) + ½ r 2 ] n1 [ ¼ r 2 + ½ r(1-r)] n2+n3 [ ¼ (1-r) 2 ] n4 Likelihood value under H0 L 0 (r=0.5|n ij ) = n!/(n 1 !...n 4 !)  [3/4(1-.5) 2 +.5(1-.5) + ½.5 2 ] n1 [ ¼ ½.5(1-.5)] n2+n3 [ ¼ (1-.5) 2 ] n4 LOD = log 10 [L 1 (r|n ij )/L 0 (r=0.5|n ij )] = 3.17 > critical LOD=3

Two Point Analysis in F2 Partial Informative Markers (dominant) function LOD=calcLOD_F2Partial(r,n1,n2,n3,n4) %log likelihood under H1 LOD=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))... +(n2+n3)*log10(r^2/4+r*(1-r)/2)... +(n4)*log10((1-r)^2/4); %log likelihood under H0 r=0.5; LOD0=(n1)*log10((1-r)^2*3/4+r^2/2+r*(1-r))... +(n2+n3)*log10(r^2/4+r*(1-r)/2)... +(n4)*log10((1-r)^2/4); LOD=LOD-LOD0; Matlab program to calculate log likelihood test score (LOD)

Three Point Analysis in Backcross a rice data

RG472 RG K5 U10 RG532 W1 RG173 RZ276 Amy1B RG146 RG345 RG381 RZ19 RG690 RZ730 RZ801 RG810 RG RG437 RG544 RG171 RG157 RZ318 Pall RZ58 CDO686 Amy1A/C RG95 RG654 RG256 RZ213 RZ123 RG RG104 RG348 RZ329 RZ892 RG100 RG191 RZ678 RZ574 RZ284 RZ394 pRD10A RZ403 RG179 CDO337 RZ337A RZ448 RZ519 Pgi -1 CDO87 RG910 RG418A RG218 RZ262 RG190 RG908 RG91 RG449 RG788 RZ565 RZ675 RG163 RZ590 RG214 RG143 RG chrom1chrom2chrom3chrom4

Three Point Analysis in Backcross Summarized the data as A,B,C Obs.A & BB & C 111abcn abc abCn abC aBcn aBc aBCn aBC Abcn Abc AbCn AbC ABcn ABc ABCn ABC 00

Rice Data A,B,C Obs.A & BB & C 111abcn abc = abCn abC = aBcn aBc = aBCn aBC = Abcn Abc = AbCn AbC = ABcn ABc = ABCn ABC =3800 Marker RG472 denoted by A, RG246 by B, K5 by C

Multilocus likelihood – determination of a most likely gene order Consider three markers A, B, C, with no particular order assumed. A triply heterozygous F1 ABC/abc backcrossed to a pure parent abc/abc GenotypeABC or abc ABc or abC Abc or aBC AbC or aBc Obs. n 00 =69 n 01 =12 n 10 =16 n 11 =3 Frequency under Order A-B-C (1-r AB )(1- r BC ) (1-r AB ) r BC r AB (1- r BC ) r AB r BC Order A-C-B (1-r AC )(1- r BC ) r AC r BC r AC (1-r BC ) (1-r AC )r BC Order B-A-C (1-r AB )(1- r AC ) (1-r AB ) r AC r AB r AC r AB (1-r AC ) r AB = the recombination fraction between A and B= (n 10 + n 11 )/n=0.19 r BC = the recombination fraction between B and C= (n 01 + n 11 )/n=0.15 r AC = the recombination fraction between A and C= (n 01 + n 10 )/n=0.28

What order is the mostly likely? L ABC  (1-r AB ) n00+n01 (1-r BC ) n00+n10 (r AB ) n10+n11 (r BC ) n01+n11 L ACB  (1-r AC ) n00+n11 (1-r BC ) n00+n10 (r AC ) n01+n10 (r BC ) n01+n11 L BAC  (1-r AB ) n00+n01 (1-r AC ) n00+n11 (r AB ) n10+n11 (r AC ) n01+n10 Log(LABC) = Loo(LACB) = Log(LBAC) = According to the maximum likelihood principle, the linkage order that gives the maximum likelihood for a data set is the best linkage order supported by the data. the best linkage order A B C 20cM 15cM

GenotypeABC or abc ABc or abC Abc or aBC AbC or aBc Obs. n 00 =69 n 01 =12 n 10 =16 n 11 =3 DATA Result: r AB = =0.19 r BC = =0.15 r AC = =0.28 d AB =1/4*ln[(1+2 r AB )/(1-2 r AB )]=20 d BC =1/4*ln[(1+2 r BC )/(1-2 r BC )]=15 Log(LABC) = Loo(LACB) = Log(LBAC) = the best linkage order A B C 20cM 15cM