QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University
Overview Alternative approach Linkage as Mixture Univariate/Multivariate One/more loci Practical considerations Power - Pihat vs covs - Larger Sibships
Schematic of Genome Marker 1Marker 2Marker 3Marker 4 QTL d1 d2 d3 d4
Genetic Heterogeneity Sib pairs IBD at a locus, parents AB and CD ACADBCBD AC2110 AD1201 BC1021 BD0112
Pi hat approach 1 Pick a putative QTL location 2 Compute p(IBD0) p(IBD1) p(IBD2) given marker data [Mapmaker/sibs] 3 Compute = p(IBD2) +.5p(IBD1) 4 Fit model Repeat 1-4 as necessary for different locations Elston & Stewart B ^
Major QTL effects DZ twins A1C1D1E1 P1 Q1Q2E2D2C2 P2 A2 B ^.51.25
Normal Theory Likelihood Function For raw data in Mx j=1 ln L i = f i ln [ 3 w j g(x i,: ij, G ij )] m x i - vector of observed scores on n subjects : ij - vector of predicted means G ij - matrix of predicted covariances - functions of parameters
General Likelihood Function ) Model for Means can differ ) Model for Covariances can differ ) Weights can differ ) Frequencies can differ Things that may differ over subjects i = 1....n subjects (families) j=1 ln L i = f i ln [ 3 w ij g(x i,: ij, G ij )] m
Normal distribution N(: ij, G ij ) Likelihood is height of the curve : G xixi N likelihood
Weighted mixture of models Finite mixture distribution j=1 m j = 1....m models w ij Weight for subject i model j e.g., Segregation analysis ln L i = f i ln [ 3 w ij g(x i,: ij, G ij )]
Mixture of Normal Distributions Two normals, propotions w1 & w2, different means But Likelihood Ratio not Chi-Squared - what is it? :1:1 xixi g :2:2 w 1 x l 1 w 2 x l 2
Weighted Likelihood Method 1 Pick a putative QTL location 2 Compute p(IBD0) p(IBD1) p(IBD2) given marker data these are "WEIGHTS" 3 Compute likelihood of phenotype data under each of 3 IBD conditions 4 Maximize weighted likelihood of 3 Repeat 1-4 as necessary for different locations
Mixture method Add them up A1C1D1E1 P1 Q1Q2E2D2C2 P2 A A1C1D1E1 P1 Q1Q2E2D2C2 P2 A A1C1D1E1 P1 Q1Q2E2D2C2 P2 A p(IBD1) x p(IBD2) xp(IBD0) x
Dataset structure Rectangular format Id sex age P1 P2 IBD0 IBD1 IBD2 IBD0 IBD1 IBD2 Locus 1 Locus Missing data: Phenotypes ML Markers Listwise
Mx Script Mixture method !QTL analysis via Mixture Distribution method !Using marker1 !Using DZ twins only !Analysis of LDL !Dutch Adults #define nvar 1 !different for multivariate #define nsib 2 !number of siblings #NGroups=2
Mx Script Mixture part 2 G1: Parameter Estimates Calculation Begin Matrices; X Lower nvar nvar Free !familial background Z Lower nvar nvar Free !unique environment L Full 1 1 Free !QTL effect M Full 1 nvar Free !means H Full 1 1 End Matrices; Matrix H.5 Begin Algebra; F= X*X'; !familial variance E= Z*Z'; !unique environmental variance Q= L*L'; !variance due to QTL V= F+Q+E; !total variance T= F|Q|E; !parameters in one matrix for standardizing S= !standardized variance component estimates End Algebra; Labels Row S standest Labels Col S f^2 q^2 e^2 Labels Row T unstandest Labels Col T f^2 q^2 e^2 End
Mx Script G2: Dizygotic twins #include lipiddzmix.dat Select ibd0m1 ibd1m1 ibd2m1 ldl1 ldl2; Definition ibd0m1 ibd1m1 ibd2m1; Begin Matrices = Group 1; K Full 3 1 !IBD probabilities (from Merlin) U Unit 3 2 End Matrices; Specify K ibd0m1 ibd1m1 ibd2m1 Means Covariance F+Q+E | F _ F | F+Q+E _ ! IBD 0 Covariance matrix F+Q+E | F+ | F+Q+E _ ! IBD 1 Covariance matrix F+Q+E | F+Q _ F+Q | F+Q+E; ! IBD 2 Covariance matrix Weights K; ! IBD probabilities Start 1 All Start 2.8 M Option NDecimals=3 Option Multiple Issat End
Mx Script Mixture part 4 ! Test significance of QTL effect Drop L End
Output Pihat Method Summary of VL file data for group 1 Code Number Mean Variance MATRIX F This is a LOWER TRIANGULAR matrix of order 1 by MATRIX Q This is a FULL matrix of order 1 by
Output Your model has 4 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 946 Your model has 3 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 947 QTL Effect Present QTL Effect Absent Difference chi-squared = (1 df)
Output Pihat Method Your model has 4 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 946 Your model has 3 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> Degrees of freedom >>>>>>>>>>>>>>>> 947 QTL Effect Present QTL Effect Absent Difference chi-squared = (1 df)
Summary SEM - QTL direct relationship Mx graphical/script approaches Mixture vs Pihat Multivariate treatment Multilocus Missing Data Ascertainment
How much more power? Large sibships much more powerful Dolan et al 1999 Pihat simple with large sibships - Solar, Genehunter etc · Pihat shows substantial bias with missing data
Expected IBD Frequencies TypeConfigurationFrequency 124/16 218/16 304/16 Sibships of size 2
Expected IBD Frequencies TypeConfigurationFrequency 12224/ / / / / / / / / /64 Sibships of size 3
More power in large sibships Dolan, Neale & Boomsma (2000) +Size 2 o Size 3 * Size 4
Number of IBD Combinations As a function of number of sibs in family Sibship SizeNumber of combinations
Mixture Approach for Pedigrees Iterate configurations within families Only use non-zero IBD probabilities Set threshold? Improves with genotype data Allows moderated genotypes Some ideas
Strategy 2 Families within combinations Limited # of IBD configurations Depends on max sibship size Usually Faster - Can do missing data - Cannot do moderator variables
Multivariate QTL Vectors of variables, Matrices of paths Three component mixture B ^ Q1Q2A2C2D2E2E1D1C1A1 P1P2
Two locus model R1C1A1E1 P1 Q1Q2E2A2C2 P2 R B1B1 ^ B2B2 ^
Two locus model mixture p(ibd0 R) p(ibd1 R) p(ibd2 R) R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R R1C1A1E1 P1 Q1Q2E2A2C2 P2 R p(ibd0 Q) p(ibd1 Q) p(ibd2 Q)
Multivariate multilocus multipoint )Eaves Neale & Maes 1996 )10 minutes for 5 phenotypes )Restart at previous solution )Only fit null model (q=0) once
Not dead yet )Latent variable qtls )Multiple rater )Comorbidity )Repeated measures