Download presentation
Presentation is loading. Please wait.
Published byJean Harriet Williams Modified over 9 years ago
1
Calculation of IBD probabilities David Evans University of Oxford Wellcome Trust Centre for Human Genetics
2
This Session … Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm (MERLIN) Single locus probabilities Hidden Markov Model => Multipoint IBD Other ways of calculating IBD status Elston-Stewart Algorithm MCMC approaches MERLIN Practical Example IBD determination Information content mapping SNPs vs micro-satellite markers?
3
14 2413 31 Identical by Descent 21 2311 31 Identical by state only Two alleles are IBD if they are descended from the same ancestral allele Identity By Descent (IBD)
4
Consider a mating between mother AB x father CD: IBD 0 : 1 : 2 = 25% : 50% : 25% Sib 2 Sib1 ACADBCBD AC2110 AD1201 BC1021 BD0112 Example: IBD in Siblings
5
1/23/4 4/41/42/41/33/4 4/4 1/4 Affected relatives not only share disease alleles IBD, but also tend to share marker alleles close to the disease locus IBD more often than chance IBD sharing forms the basis of non- parametric linkage statistics Why is IBD Sharing Important?
6
Crossing over between homologous chromosomes
7
Cosegregation => Linkage A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 Non-recombinant Parental genotypes (many, 1 – θ) A1A1 A2A2 Q1Q1 Q2Q2 Recombinant genotypes (few, θ ) Parental genotype Alleles close together on the same chromosome tend to stay together in meiosis; therefore they tend be co-transmitted.
8
Segregating Chromosomes MARKER DISEASE GENE
9
Marker Shared Among Affecteds Genotypes for a marker with alleles {1,2,3,4} 1/23/4 4/41/42/41/33/4 4/4 1/4
10
Linkage between QTL and marker Marker QTL IBD 0IBD 1IBD 2
11
NO Linkage between QTL and marker Marker
12
IBD can be trivial… 1 11 1 / 22 / 2 / 2 / IBD=0
13
Two Other Simple Cases… 1 11 1 / 2 / 2 / 11 / 112 / 2 / IBD=2 22 / 22 /
14
A little more complicated… 12 / IBD=1 (50% chance) 22 / 12 / 12 / IBD=2 (50% chance)
15
And even more complicated… 11 / IBD=? 11 /
16
Bayes Theorem for IBD Probabilities j=0,1,2 jIBDGPj P i GPi P GP i GPi P GP Gi Gi P )|()( )|()( )( )|()( )( ), P(IBD )|(
17
Sib 1Sib 2 P(observing genotypes | k alleles IBD) k=0k=1k=2 A1A1A1A1 A1A1A1A1 p14p14 p13p13 p12p12 A1A1A1A1 A1A2A1A2 2p 1 3 p 2 p12p2p12p2 0 A1A1A1A1 A2A2A2A2 p12p22p12p22 00 A1A2A1A2 A1A1A1A1 p12p2p12p2 0 A1A2A1A2 A1A2A1A2 4p 1 2 p 2 2 p1p2p1p2 2p 1 p 2 A1A2A1A2 A2A2A2A2 2p 1 p 2 3 p1p22p1p22 0 A2A2A2A2 A1A1A1A1 p12p22p12p22 00 A2A2A2A2 A1A2A1A2 p1p22p1p22 0 A2A2A2A2 A2A2A2A2 p24p24 p23p23 p22p22 P(Genotype | IBD State)
18
Worked Example 11 / 11 / )|2( )|1( )|0( )( )2|( )1|( )0|( 5.0 1 GIBDP G P G P GP GP GP GP p
19
Worked Example 11 / 11 /
20
For ANY PEDIGREE the inheritance pattern at any point in the genome can be completely described by a binary inheritance vector of length 2n: v(x) = (p 1, m 1, p 2, m 2, …,p n,m n ) whose coordinates describe the outcome of the paternal and maternal meioses giving rise to the n non-founders in the pedigree p i (m i ) is 0 if the grandpaternal allele transmitted p i (m i ) is 1 if the grandmaternal allele is transmitted ac / bd / ab / cd / v(x) = [0,0,1,1]
21
Inheritance Vector Inheritance vectorPriorPosterior ------------------------------------------------------------------- 00001/161/8 00011/161/8 00101/160 00111/160 01001/161/8 01011/161/8 01101/160 01111/160 10001/161/8 10011/161/8 10101/160 10111/160 11001/161/8 11011/161/8 11101/160 11111/160 In practice, it is not possible to determine the true inheritance vector at every point in the genome, rather we represent partial information as a probability distribution of the possible inheritance vectors ab p1p1 m2m2 p2p2 m1m1 bbac acab 1 2 3 5 4
22
Computer Representation At each marker location ℓ Define inheritance vector v ℓ Meiotic outcomes specified in index bit Likelihood for each gene flow pattern Conditional on observed genotypes at location ℓ 2 2n elements !!! 0000000100100011010001010110011110001001101010111100110111101111 LLLLLLLLLLLLLLLL
23
Abecasis et al (2002) Nat Genet 30:97-101
24
Multipoint IBD IBD status may not be able to be ascertained with certainty because e.g. the mating is not informative, parental information is not available IBD information at uninformative loci can be made more precise by examining nearby linked loci
25
ac / bd / 11 / 12 / ab / 11 / cd / 12 / Multipoint IBD IBD = 0 IBD = 0 or IBD =1?
26
Complexity of the Problem in Larger Pedigrees For each person 2n meioses in pedigree with n non-founders Each meiosis has 2 possible outcomes Therefore 2 2n possibilities for each locus For each genetic locus One location for each of m genetic markers Distinct, non-independent meiotic outcomes Up to 4 nm distinct outcomes!!!
27
0000 0001 0010 1111 234m = 10… … Marker Inheritance vector (2 2xn ) m = (2 2 x 2 ) 10 =~ 10 12 possible paths !!! Example: Sib-pair Genotyped at 10 Markers 1 P(G | 0000)(1 – θ) 4
28
0000 0001 0010 1111 234m = 10… … Marker Inheritance vector (L[0000] + L[0101] + L[1010] + L[1111] ) / L[ALL] P(IBD) = 2 at Marker Three 1 (2) (1) (2) IBD
29
0000 0001 0010 1111 234m = 10… … Marker Inheritance vector P(IBD) = 2 at arbitrary position on the chromosome 1 (L[0000] + L[0101] + L[1010] + L[1111] ) / L[ALL]
30
Lander-Green Algorithm The inheritance vector at a locus is conditionally independent of the inheritance vectors at all preceding loci given the inheritance vector at the immediately preceding locus (“Hidden Markov chain”) The conditional probability of an inheritance vector v i+1 at locus i+1, given the inheritance vector v i at locus i is θ i j (1-θ i ) 2n-j where θ is the recombination fraction and j is the number of changes in elements of the inheritance vector [0000] Conditional probability = (1 – θ) 3 θ Locus 2Locus 1 [0001] Example:
31
0000 0001 0010 1111 234m = 10… … Marker Inheritance vector M(2 2n ) 2 = 10 x 16 2 = 2560 calculations Lander-Green Algorithm 1
32
0000 0001 0010 1111 123m… … Total Likelihood = 1’Q 1 T 1 Q 2 T 2 …T m-1 Q m 1 P(G|[0000]) … 0 Q i = 0 P(G|[1111]) P(G|[0001]) 0 0 0 0 0 00 0 0 0 2 2n x 2 2n diagonal matrix of single locus probabilities at locus i (1-θ) 4 … θ4θ4 T i = (1-θ) 3 θ (1-θ) 4 … θ4θ4 … … … …(1-θ)θ 3 … (1-θ) 3 θ (1-θ)θ 3 2 2n x 2 2n matrix of transitional probabilities between locus i and locus i+1 ~m(2 2n ) 2 operations = 2560 for this case !!!
33
Further speedups… Trees summarize redundant information Portions of vector that are repeated Portions of vector that are constant or zero Speeding up convolution Use sparse-matrix by vector multiplication Use symmetries in divide and conquer algorithm (Idury & Elston, 1997)
34
Lander-Green Algorithm Summary Factorize likelihood by marker Complexity m·e n Strengths Large number of markers Relatively small pedigrees
35
Elston-Stewart Algorithm Factorize likelihood by individual Complexity n·e m Small number of markers Large pedigrees With little inbreeding VITESSE, FASTLINK etc
36
Other methods Number of MCMC methods proposed ~Linear on # markers ~Linear on # people Hard to guarantee convergence on very large datasets Many widely separated local minima E.g. SIMWALK
37
MERLIN-- Multipoint Engine for Rapid Likelihood Inference
38
Capabilities Linkage Analysis NPL and K&C LOD Variance Components Haplotypes Most likely Sampling All IBD and info content Error Detection Most SNP typing errors are Mendelian consistent Recombination No. of recombinants per family per interval can be controlled Simulation
39
MERLIN Website Reference FAQ Source Binaries Tutorial Linkage Haplotyping Simulation Error detection IBD calculation www.sph.umich.edu/csg/abecasis/Merlin
40
Input Files Pedigree File Relationships Genotype data Phenotype data Data File Describes contents of pedigree file Map File Records location of genetic markers
41
Example Pedigree File 1 1 0 0 1 1 x 3 3 x x 1 2 0 0 2 1 x 4 4 x x 1 3 0 0 1 1 x 1 2 x x 1 4 1 2 2 1 x 4 3 x x 1 5 3 4 2 2 1.234 1 3 2 2 1 6 3 4 1 2 4.321 2 4 2 2 Encodes family relationships, marker and phenotype information
42
Data File Field Codes CodeDescription MMarker Genotype AAffection Status. TQuantitative Trait. CCovariate. ZZygosity. S[n]Skip n columns.
43
Example Data File T some_trait_of_interest M some_marker M another_marker Provides information necessary to decode pedigree file
44
Example Map File CHROMOSOME MARKER POSITION 2 D2S160 160.0 2 D2S308 165.0 … Indicates location of individual markers, necessary to derive recombination fractions between them
45
Worked Example 11 / 11 / 9 4 )|2( 9 4 )|1( 9 1 )|0( 5.0 1 GIBDP G P G P p merlin –d example.dat –p example.ped –m example.map --ibd
46
Application: Information Content Mapping Information content: Provides a measure of how well a marker set approaches the goal of completely determining the inheritance outcome Based on concept of entropy E = -ΣP i log 2 P i where P i is probability of the ith outcome I E (x) = 1 – E(x)/E 0 Always lies between 0 and 1 Does not depend on test for linkage Scales linearly with power
47
Application: Information Content Mapping Simulations ABI (1 micro-satellite per 10cM) deCODE (1 microsatellite per 3cM) Illumina (1 SNP per 0.5cM) Affymetrix (1 SNP per 0.2 cM) Which panel performs best in terms of extracting marker information? merlin –d file.dat –p file.ped –m file.map --information
48
SNPs + parents microsat + parents SNP microsat 0.2 cM 3 cM 0.5 cM 10 cM Densities 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0102030405060708090100 Position (cM) Information Content SNPs vs Microsatellites
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.