Gene, Allele, Genotype, and Phenotype Basic Concepts Gene, Allele, Genotype, and Phenotype A pair of chromosomes Father Mother Phenotype Subject Genotype Height IQ 1 AA 185 100 2 AA 182 104 Gene A, with two alleles A and a 3 Aa 175 103 4 Aa 171 102 5 aa 155 101 6 aa 152 103
Genetic Mapping A gene that affects a quantitative Bad news: It is very hard to detect such a gene directly. Genetic Mapping A gene that affects a quantitative trait is called a quantitative trait locus (QTL). A QTL can be detected by the markers linked with it. A QTL detected is a chromosomal segment. Marker 1 QTL Marker 2 Marker 3 Let’s see what are QTL? QTL are specific genomic segments that affect the phenotype. QTL can be detected by linked markers. This is a diagram for detecting QTL by using linked markers. The QTL detected by this approach is hypothetical chromosome segments whose DNA structure and organization are unknown. . Marker k Linkage Map
QTL Mapping in Natural Populations Basic theory for QTL mapping is derived from linkage analysis in controlled crosses There is a group of species in which it is not possible to make crosses QTL mapping in such species should be based on existing populations
Human Chromosomes Male Xy X y Female XX X XX Xy Daughter Son
Human Difference
How many genes control human body height?
Discontinuous Distribution due to a single dwarf gene
Continuous Distribution due to many genes?
Continuous Variation due to Polygenes 31=3, 32=9, …, 310=59,049 Environmental modifications Gene-environmental interactions
Power statistical methods are crucial for the identification of human height genes
Data Structure 1 AA(2) BB(2) … y1 2|1 1|1 0|1 2 AA(2) BB(2) ... y2 Subject Marker (M) Conditional prob M1 M2 … Mm Phenotype (y) of QTL genotype QQ(2) Qq(1) qq(0) 1 AA(2) BB(2) … y1 2|1 1|1 0|1 2 AA(2) BB(2) ... y2 2|2 1|2 0|2 3 Aa(1) Bb(1) ... y3 2|3 1|3 0|3 4 y4 2|4 1|4 0|4 5 y5 2|5 1|5 0|5 6 Aa(1) bb(0) ... y6 2|6 1|6 0|6 7 aa(0) Bb(1) ... y7 2|7 1|7 0|7 8 aa(0) bb(0) … y8 2|8 1|8 0|8
Association between marker and QTL Linkage disequilibrium mapping – natural population Association between marker and QTL -Marker, Prob(M)=p, Prob(m)=1-p -QTL, Prob(A)=q, Prob(a)=1-q Four haplotypes: Prob(MA)=p11=pq+D p=p11+p10 Prob(Ma)=p10=p(1-q)-D q=p11+p01 Prob(mA)=p01=(1-p)q-D D=p11p00-p10p01 Prob(ma)=p00=(1-p)(1-q)+D
Joint and conditional (j|i) genotype prob. between marker and QTL AA Aa aa Obs MM p112 2p11p10 p102 n2 Mm 2p11p01 2(p11p00+p10p01) 2p10p00 n1 mm p012 2p01p00 p002 n0 MM p112 2p11p10 p102 n2 p2 p2 p2 2p(1-p) 2p(1-p) 2p(1-p) mm p012 2p01p00 p002 n0 (1-p)2 (1-p)2 (1-p)2
Mixture model-based likelihood with marker information Linkage disequilibrium mapping – natural population Mixture model-based likelihood with marker information L(|y,M)=i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Sam- Height Marker genotype QTL genotype ple (cm, y) M AA Aa aa 1 184 MM (2) 2|1 1|1 0|1 2 185 MM (2) 2|2 1|2 0|2 3 180 Mm (1) 2|3 1|3 0|3 4 182 Mm (1) 2|4 1|4 0|4 5 167 Mm (1) 2|5 1|5 0|5 6 169 Mm (1) 2|6 1|6 0|6 7 165 mm (0) 2|7 1|7 0|7 8 166 mm (0) 2|8 1|8 0|8 Prior prob.
= i=1n [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Linkage disequilibrium mapping – natural population Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(|y,M) = i=1n [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] = i=1n2 [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Conditional on 2 (n2) i=1n1 [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Conditional on 1 (n1) i=1n0 [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Conditional on 0 (n0)
Normal distributions of phenotypic values for each QTL genotype group Linkage disequilibrium mapping – natural population Normal distributions of phenotypic values for each QTL genotype group f2(yi) = 1/(22)1/2exp[-(yi-2)2/(22)], 2 = + a f1(yi) = 1/(22)1/2exp[-(yi-1)2/(22)], 1 = + d f0(yi) = 1/(22)1/2exp[-(yi-0)2/(22)], 0 = - a
Linkage disequilibrium mapping – natural population Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(|y,M) = i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] log L(|y,M) = i=1n log[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Define 2|i = 2|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (1) 1|i = 1|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (2) 0|i = 0|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (3) 2 = i=1n(2|iyi)/ i=1n2|i (4) 1 = i=1n(1|iyi)/ i=1n1|i (5) 0 = i=1n(0|iyi)/ i=1n0|i (6) 2 = 1/ni=1n[2|i(yi-2)2+1|i(yi-1)2+0|i(yi-0)2] (7)
Complete data Prior prob QQ Qq qq Obs MM p112 2p11p10 p102 n2 Mm 2p11p01 2(p11p00+p10p01) 2p10p00 n1 mm p012 2p01p00 p002 n0 MM n22 n21 n20 n2 Mm n12 n11 n10 n1 mm n02 n01 n00 n0 p11=[2n22 + (n21+n12) + n11]/2n, p10=[2n20 + (n21+n10) + (1-)n11]/2n, p01=[2n02 + (n12+n01) + (1-)n11]/2n, p11=[2n00 + (n10+n01) + n11]/2n, =p11p00/(p11p00+p10p01)
Incomplete (observed) data Posterior prob QQ Qq qq Obs MM 2|i 1|i 0|i n2 Mm 2|i 1|i 0|i n1 mm 2|i 1|i 0|i n0 p11=[i=1n2(22|i+1|i)+i=1n1(2|i+1|i)]/2n, (8) p10={i=1n2(20|i+1|i)+i=1n1[0|i+(1-)1|i]}/2n, (9) p01={i=1n0(22|i+1|i)+i=1n1[2|i+(1-)1|i]}/2n, (10) p00=[i=1n2(20|i+1|i)+i=1n1(0|i+1|i)]/2n (11)
EM algorithm (1) Give initiate values (0) =(2,1,0,2,p11,p10,p01,p00)(0) (2) Calculate 2|i(1), 1|i(1) and 0|i(1) using Eqs. 1-3, (3) Calculate (1) using 2|i(1), 1|i(1) and 0|i(1) based on Eqs. 4-11, (4) Repeat (2) and (3) until convergence.
Hypothesis Tests Is there a significant QTL? H0: μ2 = μ1 = μ1 H1: Not H0 LR1 = -2[ln L0 – L1] Critical threshold determined from permutation tests
Hypothesis Tests Can this QTL be detected by the marker? H0: D = 0 H1: Not H0 LR2 = -2[ln L0 – L1] Critical threshold determined from chi-square table (df = 1)
A case study from human populations 105 black women and 538 white women; 10 SNPs genotyped within 5 candidates for human obesity; Two obesity traits, the amount of body fat (body mass index, BMI) and its distribution throughout the body (waist to hip circumference ratio, WHR)
Objective Detect quantitative trait nucleotides (QTNs) predisposing to human obesity traits, BMI and WHR
BMI SNP Chrom. Black White ADRA1A 8p21 q 0.20 D 0.04 a 11.40 d -2.63 LR 3.90* NS WHR ADRB1 10q24 q 0.83 D -0.07 a -0.15 d -0.24 LR 5.91* NS ADRB2 5q32-33 q 0.16 D 0.07 a 0.16 d -0.20 LR 5.88* NS ADRB2- 5/20 q 0.83 0.78 GNAS1 D 0.02 0.03 a -0.18 -0.15 d -0.10 -0.16 LR 8.42* 8.06*
Shape mapping meets LD mapping Mapping Body Shape Genes through Shape Mapping Ningtao Wang, Yaqun Wang, Zhong Wang, Han Hao and Rongling Wu* Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA J Biom Biostat 2012, 3:8