Download presentation
Presentation is loading. Please wait.
Published byDarcy Francis Modified over 8 years ago
1
Association mapping Fundamental Principles and a few methods Thomas Mailund Slides: www.daimi.au.dk/~mailund/association-mapping/
2
Outline ● Introduction ➔ Goals and setup ➔ Genetic variation in humans ● Marker/disease association ➔ Indirect association and linkage disequilibrium ➔ Background population genetics concepts ➔ “Global” and “local” genealogies ● Mapping methods ➔ Local genealogies (Blossoc) ➔ Clustering (HapCluster)
3
What are we looking for? Gunshot woundsCar accidentsSmoking inducedlung cancerCardiovasculardiseaseObesityDiabetes 2AlzheimerSchizophreniaBRCA1 breastcancer Cystic fibrosisHemophilia Environment Genes
4
Goal of association mapping Identification of susceptibility variant, replication in different cohort/population, understanding of genetic function at the cell level, this can lead to 1. identification of drugable targets, 2. development of drug for prevention 3. better understanding of the cellular processes that are involved in disease treatments Mapping function = P(Disease gene location | data) Drugable target Treatment Understanding of Cellular processes
5
Case-control studies Disease Responder Control Non-responder Allele 1Allele 2 Marker A is associated with Phenotype Marker A: Allele 1 = Allele 2 = Cautions ● Subgroup analysis and multiple testing ● Poorly defined phenotypes ● Poorly matched controls, population stratification ● Failure to replicate ● Optimistic interpretation of results ● Positive publication bias ● Measuring the wrong variation
6
Relative risk ● Relative risk (RR) ● RR is the likelihood of disease in the exposed group (susceptibility allele or genotype carriers) compared to the unexposed group (not carriers) ● E.g. RR = 1.5 indicates that carriers of the A allele have 1.5 times the risk of disease than non-carriers, i.e. 50% more likely to get the disease. ● Genotype relative risk (GRR) ● Relative risk assigned to genotypes AA, Aa, aa ● GRR(Aa) = P(diseased|Aa) / P(diseased | aa) ● GRR(AA) = P(diseased|AA) / P(diseased | aa) ● E.g. additive model: ● P(disease | aa) = b; P(disease | Aa) = b + e; P(disease | AA) = b + 2e ● GRR(Aa) = (b+e)/b; GRR(AA) = (b+2e)/b = 2GRR(Aa)-1 ● If GRR(Aa) = 1.5, then GRR(AA) = 2
7
Relative risk: Examples ● Huntington’s Disease >1000 ● Cystic Fibrosis 400 ● Autism 75 ● Inflammatory Bowel Disease 60 ● Multiple Sclerosis 20 ● Juvenile Diabetes 15 ● Schizophrenia 10 ● Asthma 6 ● Prostate Cancer 5 ● Late Onset Diabetes 2-3 ● Breast Cancer 2 Examples from Lon Cardon Relative risk of being related to an affected (any genetic effect) ● Genes with relative risk for schizophrenia ➔ Neuregulin (NRG1) GRR: 2 ➔ Calcineurin (PPP3CC) GRR: 1.3 ➔ Cathechol-O-methyl transferase (COMT) GRR: 1.5
8
Genetic variation ● Very little variation in humans (compared to related species)
9
Genetic variation ● Each new cell contains ~3 new mutations ● Each new “child” ~20 new mutations ● On average the sequences of any two human genomes are 99.9% the same ➔ 0.1% of the genome ~ 3 million base pairs ➔ Maybe as much as 2.5 billion sites has variation in the entire population ● This genetic variation (plus environmental influences) is responsible for variation in human traits.
10
Types of variation Annu. Rev. Genom. Human Genet. 2006.7:407-442.
11
Single Nucleotide Polymorphisms ● The most common form of genetic polymorphism ● Common variants (MAF>5%) estimated to occur every 100-300 bp (10 – 30 million SNPs)
12
HapMap ● Phase I: 1 million SNPs in 90 individuals from Europe, Africa and Asia ● Phase II: 3 million SNPs ● SNP selection made easy ● 250K & 500K SNPs based on non-redundant Phase I commercially available ● Genome wide scans a reality
13
Setup: case/control sequences --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------C---C---A---- --A--------G--------G----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----T--------T---A---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------A----G--------T---C---G---- --T--------C--------A----T--------T---C---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------G----T--------C---A---A---- --A--------C--------A----G--------C---C---G---- Cases (affected) Controls (unaffected) Sequences of nucleotides at known polymorphic sites
14
Actual setup: unphased sequences -A/T------C/G------A/A--G/G------T/C-C/C-A/A--- -A/T------C/G------G/A--G/G------T/C-C/C-A/A--- -T/T------C/C------A/A--G/T------T/T-C/A-A/A--- -A/A------C/C------A/A--G/G------T/T-C/C-A/A--- -A/T------C/C------A/A--G/T------T/T-C/C-G/A--- -A/A------C/C------G/A--G/T------T/C-C/A-A/A--- Sequences of pairs of nucleotides at known polymorphic sites Phase inference software: Phase SNPHAP but see also: Morris et al. 2004
15
Association mapping --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------C---C---A---- --A--------G--------G----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----T--------T---A---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------A----G--------T---C---G---- --T--------C--------A----T--------T---C---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------G----T--------C---A---A---- --A--------C--------A----G--------C---C---G---- We are searching for an association between variant and disease status Significant difference in distributions?
16
Significant difference in distribution --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------C---C---A---- --A--------G--------G----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----T--------T---A---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------A----G--------T---C---G---- --T--------C--------A----T--------T---C---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------G----T--------C---A---A---- --A--------C--------A----G--------C---C---G---- Consider a single marker...
17
Significant difference in distribution Affected Unaffected Allele G Allele T Contingency table: T, Unaffected G, Unaffected T, AffectedG, Affected
18
Significant difference in distribution Affected | G Unaffected | G Allele G Allele T Conditional disease status: T, Unaffected G, Unaffected T, AffectedG, Affected Affected | T Unaffected | T
19
Significant difference in distribution Affected | G Unaffected | G Allele G Allele T If the marker does not affect the disease: T, Unaffected G, Unaffected T, AffectedG, Affected Affected | T Unaffected | T P( Affected|G ) = P( Affected|T )
20
Significant difference in distribution Affected | G Unaffected | G Allele G Allele T If the marker does affect the disease: T, Unaffected G, Unaffected T, AffectedG, Affected Affected | T Unaffected | T P( Affected|G ) > P( Affected|T )
21
Significant difference in distribution Null-hypothesis: the marker does not affect the disease status P(T, Unaffected) = P(T)P(Unaffected) P(G, Unaffected) = P(G)P(Unaffected) P(T, Affected) = P(T)P(Affected) P(G, Affected) = P(G)P(Affected) P( Affected ) P( Unaffected ) P( G )P( T )
22
Significant difference in distribution ● The null-hypothesis tested with ➔ Fisher’s exact test (for small data sets) ➔ 2 test (large sample approximation) when each cell has count > 5 ➔ Allelic level: 2x2 matrix ➔ Genotype level: 2x3 matrix ➔ For two loci, there are 9 different two-loci genotypes, i.e. Interactions can be tested in a 2x9 matrix
23
Relative risk and power Statistical power: The probability of rejecting the null hypothesis when it is in fact false Simulations by M. Schierup 1000 simulations with additive disease model P(A) = 0.1; 1000 cases and 1000 controls 5% significance level (0.005% with Bonferroni correction)
24
The Central Dogma: the common disease / common variant (CD/CV) hypothesis Reich & Lander 2001 Population expansion < 100.000 years ago Rare variant Common variant In a small population, allelic heterogeneity is small < 100.000 years ago the human population was very small Even though the human population today is large the frequency spectrum of variants still reflects the recent small size/bottleneck common diseases caused by few common variants (and a lot of rare undetectable variants caused by recent mutations) Past Present If association studies locate many susceptibility variants, the hypothesis has been tested true
25
Frequency and power Simulations by M. Schierup 1000 simulations with additive disease model 1000 cases and 1000 controls, P(disease | aa) = 0.05 5% significance level (0.005% with Bonferroni correction)
26
Example: Cystic fibrosis 2 -test for different distributions Kerem et al. (1989) Control group: 92 SNP Haplotypes Case group: 94 SNP Haplotypes 23 SNP Markers
27
An indirect approach --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- ● Disease site unlikely to be among our markers ➔ Might be an unknown polymorphic site (and not necessarily a SNP) ➔ Just not part of the typed markers (maybe typed 500K out of 3 billion nucleotides!)
28
An indirect approach --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- ● The markers are not independent ➔ Knowing one marker is partial knowledge of others ➔ The non-independence is called LD: Linkage Disequilibrium
29
Genealogical view of LD Variations in Chromosomes Within a Population Common Ancestor Emergence of Variations Over Time time present Disease Mutation
30
Linkage disequilibrium Variations in Chromosomes Within a Population P( ) = P( ) = 0.42 P( ) = 0.42 P( )P( ) = 0.17 D( ) = P( ) - P( )P( ) = 0.24 P( ) = 0.17 P( ) = 0.29 P( ) = 0.17 P( )P( ) = 0.05 D( ) = P( ) - P( )P( ) = 0.12
31
Measures of LD Correlation Coeffecient Measure [0,1] Hill & Robertson (1968) Range constrained by allele frequencies [0,1] Lewontin (1964) D’(AB) = if D(AB) > 0: D(AB) / min(P(A)P(b),P(a)P(B)) else: - D(AB) / min(P(A)P(B),P(a)P(b)) D(AB) = P(AB) – P(A)P(B) = D(ab) = -D(Ab) = -D(aB) r 2 (AB) = D 2 (AB) / P(A)P(a)P(B)P(b)
32
Linkage disequilibrium Variations in Chromosomes Within a Population P( ) = P( ) = 0.42 P( ) = 0.42 P( )P( ) = 0.17 D( ) = P( ) - P( )P( ) = 0.24 D’( ) = D( ) / min{P( )(1-P( )),(1-P( ))P( )} = 0.24 / min{0.42x0.58, 0.58x0.42} = 1 r 2 ( ) = D 2 ( ) / P( )(1-P( ))P( )(1-P( )) = 0.06 / 0.42x0.58x0.42x0.58 =1
33
Linkage disequilibrium Variations in Chromosomes Within a Population P( ) = 0.17 P( ) = 0.29 P( ) = 0.17 P( )P( ) = 0.05 D( ) = P( ) - P( )P( ) = 0.12 D’( ) = D( ) / min{P( )(1-P( )), (1-P( ))P( )} = 0.12 / min{0.17x0.71, 0.83x0.29} = 1 r 2 ( ) = D 2 ( ) / P( )(1-P( ))P( )(1-P( )) = 0.01 / 0.17x0.83x0.29x0.71 = 0.49
34
Causes of LD Time t ago Now Creates LDBreaks down LD DriftRecombination Selection(Gene conversion) Admixture
35
An indirect approach --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- ● The markers are not independent ➔ Knowing one marker is partial knowledge of others ➔ This non-independence decreases with distance --A--------C--------A----G---X----T---C---A----
36
A short detour: population genetics Parents Gametes Diploid model of reproduction (Without recombination) Offsprin g Chromosome reproduction (without recombination)
37
Wright-Fisher model ● Discrete, non-overlapping generations ● Constant population size ● Each individual in one generation is ➔ a random copy of an individual from the previous generation ➔ or a new mutation Mutation
38
Recombinations
39
Non-Ancestral Material Crossover point
40
Wright-Fisher with recombination ● Discrete, non-overlapping generations ● Constant population size ● Each individual in one generation is ➔ a random copy of an individual from the previous generation ➔ a new mutation ➔ a recombination of two individuals from the previous generation, at a random cross-over point Recombination
41
Mutation + Recombination Mutation Point
42
Mutation + Recombination Mutation Point
43
Mutation + Recombination Mutation Point
44
Mutation + Recombination Mutation Point
45
Mutation + Recombination Mutation Point
46
Mutation + Recombination Mutation Point
47
Mutation + Recombination Mutation Point (+ )
48
Mutation + Recombination Mutation Point (+ )
49
Indirect association Mutation Point Complete association Less association Even less association
50
An indirect approach 2 -test for different distributions Highly associated because they are close to the disease affecting site
51
An indirect approach Linkage disequilibrium measured by r 2 using Haploview 3.12 These associations are NOT independent, i.e. they probably mark the same variant BRCA2 gene Prostate cancer in Iceland
52
Extend of relatedness ● “Nearby” is ~ 0.1–0.01 cM ➔ ~ 100–10 Kbp ➔ ~ 1/30,000 – 1/300,000 of the genome ● Closer spacing needed for accuracy ➔ ~ 500,000–1,000,000 for whole genome ➔ ~10–100 for typical gene
53
LD as a function of distance From: Clark et al. 2003, AJHG 73:285-300. Empirical results from HapMap data LD(r 2 ) Recombination rate From: Hein et al. 2005 Simulation results
54
Extend of relatedness Isolated recently founded Quebec, Cajun Acadiana Utah Amish Iceland Faroese Islands extends over longer distances “low” density marker map low resolution extends over shorter distances “high” density marker map high resolution Isolated relatively old Kainuu (Finland) North Karelia (Finland) Sardinien Ashkenazi Jews non-isolated relatively old (bottlenecks) European, Asian ● Population dependent ➔ Founding age ➔ Isolation (inbreeding) Africa
55
Variation in recombination rate Sperm analysis Population genetic data Myers et al. 2005 McVean et al. 2005
56
Tagging SNPs ● Close markers are in linkage disequilibrium, i.e. one marker carries information on nearby variation ● LD between SNPs are so high that typing the whole set will provide no more information than typing a few tagSNPs ● tagSNPs: a minimal number of informative markers can be used to identify the common haplotypes in each block
57
But notice! 6 markers with low association Responsible marker Distance from APOE locus (Kbp) Alzheimer and ApoE: Closeness to the disease marker does not guarantee significance!
58
Multi-marker approaches... --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------C---C---A---- --A--------G--------G----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----T--------T---A---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------A----G--------T---C---G---- --T--------C--------A----T--------T---C---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------G----T--------C---A---A---- --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------C---C---A---- --A--------G--------G----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----T--------T---A---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------A----G--------T---C---G---- --T--------C--------A----T--------T---C---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------G----T--------C---A---A---- Single marker approach: Multi marker approach:
59
Using the (local) genealogy of the locus ● Tree at disease site: ➔ “Perfect” setup ➔ Incomplete penetrance ➔ Other disease causes HHHHHHHH DDDDD HHHHHHHH DDDHD HDHHHDHH DDDHD Templeton et al 1987
60
Using the (local) genealogy of the locus ● At the disease site: ➔ A significant clustering of diseased/healthy HDHHHDHH DDDHD Templeton et al 1987
61
Using the (local) genealogy of the locus ● Local genealogies ➔ Each site a different genealogy ➔ Nearby genealogies only slightly different --T--------G--------A----G---X----C----C-----A-- --A--------G--------G----G---X----C----C-----A-- --A--------C--------A----G---X----T----C-----A-- --T--------C--------A----G---X----T----C-----A-- --T--------C--------A----T---X----T----A-----A-- --A--------C--------A----G---X----T----C-----A-- AAATT T CCGG CC AAAGA A GGGG GT TTCCT T CCCC CA AAAAA A A nearby tree an imperfect local tree
62
Detour: Genealogies... MRCA of the sampled sequences A coalescent event for two sampled sequences 1 123 2 3
63
Detour: Genealogies... MRCA of the sampled sequences A coalescent event for two sampled sequences 1 123 2 3 A recombination event
64
Ancestral Recombination Graph (Hudson 1990, Griffith&Marjoram 1996) Sampled sequences MRCA
65
Ancestral Recombination Graph (Hudson 1990, Griffith&Marjoram 1996) Recombinations Coalescence
66
Ancestral Recombination Graph (Hudson 1990, Griffith&Marjoram 1996) Non-ancestral material Non-ancestral material
67
Ancestral Recombination Graph Mutations 1234 (Hudson 1990, Griffith&Marjoram 1996)
68
Ancestral Recombination Graph 1234 (Hudson 1990, Griffith&Marjoram 1996) The ARG is a complete genealogy for the sampled sequences
69
“Local” genealogies For each “point” on the chromosome, the ARG determines a (local) tree:
70
“Local” genealogies For each “point” on the chromosome, the ARG determines a (local) tree:
71
“Local” genealogies For each “point” on the chromosome, the ARG determines a (local) tree:
72
“Local” genealogies For each “point” on the chromosome, the ARG determines a (local) tree:
73
“Local” genealogies ● Different topologies ● Different branch lengths ● Different inheritance
74
“Local” genealogies Type 1: No change Type 2: Change in branch lengths Type 3: Change in topology From Hein et al. 2005
75
“Local” genealogies Recombination rate From Hein et al. 2005 M AB = [∑ i,j I {i=j} bl(i)bl(j)] / tbl(A)tbl(B) S AB = M AB / M AA Tree measure:
76
Using the (local) genealogy of the locus --T--------G--------A----G---X----C----C-----A-- --A--------G--------G----G---X----C----C-----A-- --A--------C--------A----G---X----T----C-----A-- --T--------C--------A----G---X----T----C-----A-- --T--------C--------A----T---X----T----A-----A-- --A--------C--------A----G---X----T----C-----A-- AAATT T CCGG CC AAAGA A GGGG GT TTCCT T CCCC CA AAAAA A Tree at disease site resembles neighbours
77
Using the (local) genealogy of the locus ● Near the disease site: ➔ A significant clustering of diseased/healthy HDHHHDHH DDDHD Templeton et al 1987 Zöllner&Pritchard 2004
78
Using the (local) genealogy of the locus ● Approach: ➔ Infer trees over regions ➔ Score the regions wrt their clustering HDHHHDHH DDDHD Templeton et al 1987 Zöllner&Pritchard 2004
79
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 ● In the infinite sites model: ➔ Each mutation occurs only once ➔ Each mutation splits the sample in two ➔ A consistent tree can efficiently be inferred for a region without recombinations
80
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 Use the four-gamete test to find regions that can be explained by a tree
81
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 Build a tree for each such region
82
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 Build a tree for each such region
83
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 Build a tree for each such region
84
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 Build a tree for each such region
85
BLOck aSSOCiation (BLOSSOC) Mailund et al. 2006 Score the tree, and assign the score to the region
86
Scoring trees... Red=cases Green=controls Are the case chromosomes significantly overrepresented in some clusters?
87
Cystic Fibrosis example
88
Simulated Example (CoaSim)
89
Augmented HapMap data
90
Implementation... Homepage: www.daimi.au.dk/~mailund/Blossoc Command line and graphical user interface...
91
Statistical model based approaches... Statistic al framewo rk Molecu lar biology Prior knowled ge Geneti cs Some model explaining the sequences and status --A--------C--------A----G--------T---C---A---- --T--------G--------A----G--------C---C---A---- --A--------G--------G----G--------C---C---A---- --A--------C--------A----G--------T---C---A---- --T--------C--------A----G--------T---C---A---- --T--------C--------A----T--------T---A---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------A----G--------T---C---G---- --T--------C--------A----T--------T---C---A---- --A--------C--------A----G--------T---C---A---- --A--------C--------G----T--------C---A---A---- --A--------C--------A----G--------C---C---G----
92
Statistical model based approaches... A model gives a probability distribution on the data: P( D | ) Data, e.g. sequences and disease status Parameters, e.g. penetrance, disease locus, and genealogy
93
Statistical model based approaches... A model gives a probability distribution on the data: P( D | ) also gives us likelihood approaches: lhd() = P( D | ) MLE: argmax lhd() and Bayesian approaches: P( |D ) ∝ P( D | ) P() = lhd() P()
94
MCMC (Metropolis) approach 1Compute the likelihood in the current point, lhd()=L 2Suggest a new point, ' 3Compute the likelihood in this point f(') = L’ 4If L ≤ L’, go to point ' 5If L > L’, go to point ' with the probability L’/L lhd(x) = ∫ lhd(x,) d All parameters except x
95
MCMC (Metropolis) approach 1 1 2? 2! 2 3 1 Projection on one axis equivalent to integration over the remaining parameters The resulting samples approximate the likelihood lhd
96
The HapCluster model Waldron et al. 2006 ---A----G----X---C---C---A---- ---A----T--------C---G---A---- ---A----G--------C---G---A---- ---T----G--------C---C---G---- ---A----T--------C---G---A---- ---T----G--------C---G---A---- Unrelated “wildtypes” (Locally) related “mutants”
97
The HapCluster model Waldron et al. 2006 ---A----G----X---C---C---A---- ---A----T--------C---G---A---- ---A----G--------C---G---A---- ---T----G--------C---C---G---- (Locally) related “mutants” ● “Mutants” defined by local sequence similarity to “ancestral” sequence ● Implicitly assuming star-genealogy
98
The HapCluster model Waldron et al. 2006 ● Given “ancestral” sequence and a distance measure: ➔ Defines cluster around the ancestral sequence ➔ Sequences above a given similarity threshold considered “mutants” ➔ Sequences below considered “wild types”
99
The HapCluster model Waldron et al. 2006 ● Each individual has one of the genotypes: ➔ “mutant” & “mutant” ➔ “mutant” & “wild type” ➔ “wild type” & “wild type” ● Each has a different risk ( MM, MW, WW ) of being affected ● Likelihood:
100
The HapCluster model Waldron et al. 2006 ● Risks considered nuisance parameters and integrated out
101
HapCluster MCMC approach Point: trait-locus, ancestral haplotype, other (nuisance) parameters Change functions: move trait-locus, change cluster size, change ancestral haplotype... Likelihood function: product of Beta functions Waldron et al. 2006
102
Example: Simulated dataset
104
Implementation... Homepage: www.daimi.au.dk/~mailund/HapCluster Command line version only...
105
Summary ● Introduction ➔ Goals and setup ➔ Genetic variation in humans ● Marker/disease association ➔ Indirect association and linkage disequilibrium ➔ Background population genetics concepts ➔ “Global” and “local” genealogies ● Mapping methods ➔ Local genealogies (Blossoc) ➔ Clustering (HapCluster)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.