Introduction to linkage analysis

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Linkage and Genetic Mapping
Planning breeding programs for impact
Genetic Linkage and Mapping Notation — ————— A _________ A a Aa Diploid Adult Haploid gametes (single chromatid) — ————— Two homologous chromosomes,
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Gene Linkage and Genetic Mapping
Chapter 11 Mendel & The Gene Idea.
Genetics SC Biology Standard B The students will be able to predict inherited traits by using the principles of Mendelian Genetics, summarize.
Linkage and Gene Mapping. Mendel’s Laws: Chromosomes Locus = physical location of a gene on a chromosome Homologous pairs of chromosomes often contain.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
31 January, 2 February, 2005 Chapter 6 Genetic Recombination in Eukaryotes Linkage and genetic diversity.
Parametric and Non-Parametric analysis of complex diseases Lecture #8
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Genetic Recombination in Eukaryotes
Observing Patterns in Inherited Traits
Chapter 12 Mendel and Heredity.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Broad-Sense Heritability Index
Multifactorial Traits
Introduction to some basic concepts in quantitative genetics Course “Study Design and Data Analysis for Genetic Studies”, Universidad ded Zulia, Maracaibo,
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Non-Mendelian Genetics
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Copyright © 2013 Pearson Education, Inc. All rights reserved. Chapter 4 Genetics: From Genotype to Phenotype.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
INTRODUCTION TO ASSOCIATION MAPPING
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Genetics – Study of heredity is often divided into four major subdisciplines: 1. Transmission genetics, deals with the transmission of genes from generation.
1 Variation, probability, and pedigree Gamete production is source of variation and genetic diversity, an advantage of sex. –As a result of segregation.
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
From: Scheinfeld A (1965) Your heredity and environment. JB Lippincott Company, Philadelphia Phenotypic variation among humans is enormous.
1 Seminar 4: Applied Epidemiology Kaplan University School of Health Sciences.
I. Allelic, Genic, and Environmental Interactions
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Methods of Presenting and Interpreting Information Class 9.
Concept 14.2: The laws of probability govern Mendelian inheritance
Genetic Linkage.
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
Genes may be linked or unlinked and are inherited accordingly.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
And Yet more Inheritance
MULTIPLE GENES AND QUANTITATIVE TRAITS
Concept 14.2: The laws of probability govern Mendelian inheritance
Chapter 7 Multifactorial Traits
Using Punnett Squares A Punnett square is a model that predicts the likely outcomes of a genetic cross. A Punnett square shows all of the genotypes that.
Lecture 9: QTL Mapping II: Outbred Populations
Linkage Analysis Problems
Presentation transcript:

Introduction to linkage analysis Harald H.H. Göring Course “Study Design and Data Analysis for Genetic Studies”, Universidad ded Zulia, Maracaibo, Venezuela, 9-10 April 2005

“Marker” loci There are many different types of polymorphisms, e.g.: single nucleotide polymorphism (SNP): AAACATAGACCGGTT AAACATAGCCCGGTT microsatellite/variable number of tandem repeat (VNTR): AAACATAGCACACA----CCGGTT AAACATAGCACACACACCGGTT insertion/deletion (indel): AAACATAGACCACCGGTT AAACATAG--------CCGGTT restriction fragment length polymorphism (RFLP) …

Tracing chromosomal inheritance using “marker” locus genotypes 1/2 3/4 1/5 4/5 5/5 1/4

Tracing chromosomal inheritance (fully informative situation)

Linkage analysis: locus with known genotypes 1/2 3/3 2/4 1/1 1/3 2/3 Where do the observed genotypes “fit”?

Linkage analysis In linkage analysis, one evaluates statistically whether or not the alleles at 2 loci co-segregate during meiosis more often than expected by chance. If the evidence of increased co-segregation is convincing, one generally concludes that the 2 loci are “linked”, i.e. are located on the same chromosome (“syntenic loci”). The degree of co-segregation provides an estimate of the proximity of the 2 loci, with near complete co-segregation for very tightly linked loci.

Let’s step back… to Mendel

One of Mendel’s pea crosses Mendel’s law of uniformity F1 F2 Mendel’s law of independent assortment observed: ~ ratio: 315 108 101 32 9 : 3 : 3 : 1

P1 1 1 2 2 Mendel’s law of uniformity 1 2 1 2 F1 Mendel’s law of segregation 1 1 1 2 2 2 F2 25% 50% 25% (in expectation)

P1 a a b b Mendel’s law of uniformity a b a b F1 Mendel’s law of segregation a a a b b b F2 25% 50% 25% (in expectation)

P1 1 1 a a 2 2 b b Mendel’s law of uniformity 1 2 a b 1 2 a b F1 1 1 a a 1 1 a b 1 1 b b 1 2 a a 1 2 a b 1 2 b b 2 2 a a 2 2 a b 2 2 b b F2 Mendel’s law of independent assortment 6.25% 12.5% 6.25% 12.5% 25 % 12.5% 6.25% 12.5% 6.25% (in expectation) Assume, we did this experiment and observed the following: 25% 50% non-independent assortment

Co-segregation (due to linkage) P1 generation (diploid) 1 1 2 2 a a b b gametes (haploid) 1 2 a b F1 generation (diploid) Mendel’s law of uniformity 1 2 1 2 a b a b gametes (haploid) 1 2 1 2 a b a b F2 generation (diploid) Mendel’s law of segregation 1 1 1 2 2 2 a a a b b b 25 % 50 % 25 %

Recombination Recombination between 2 loci is said to have occurred if an individual received, from one parent, alleles (at these 2 loci) that originated in 2 different grandparents.

Who is a recombinant? 1/1 a/a 2/2 b/b 1 2 a b 3/3 c/c 1 3 a c 2 3 b c

Possible explanations for recombination 1/1 2/2 a/a b/b 1 2 a b 1 a 1 b 2 a 2 b N R R N 1 a 2 b I 1 1 2 2 different chromosomes a b a b homologous recombination during meiosis 1 a 2 b II 1 1 2 2 a b a b III 2 genotyping error R a

Recombination fraction The recombination fraction between 2 loci is defined as the proportion of meioses resulting in a recombinant gamete. For loci on different chromosomes (or for loci far apart on the same, large chromosome), the recombination fraction is 0.5. Such loci are said to be unlinked. For loci close together on the same chromosome, the recombination fraction is < 0.5. Such loci are said to be linked. The closer the loci, the smaller the recombination fraction ( 0).

Estimation of recombination fraction 1/1 a/a 2/2 b/b 1 2 a b 3/3 c/c 1 3 a c 2 3 b c N N N R N N N N R N

Missing phase information: Who is a recombinant?? 1 2 a b phase 1: 1 2 b a phase 2: 1/2 3/3 a/b c/c 1 3 a c 2 3 b c 1 3 a c 1 3 b c 1 3 a c 1 3 a c 2 3 b c 2 3 b c 2 3 a c 2 3 b c N R R N

Missing phase and genotype information: Who is a recombinant?? 1/2 ?/? 3/3 a/b c/c 1 3 a c 2 3 b c 1 3 a c 1 3 b c 1 3 a c 1 3 a c 2 3 b c 2 3 b c 2 3 a c 2 3 b c

Missing phase and genotype information: Who is a recombinant??? ?/? ?/? a/b c/c 1 3 a c 2 3 b c 1 3 a c 1 3 b c 1 3 a c 1 3 a c 2 3 b c 2 3 b c 2 3 a c 2 3 b c

Likelihood The likelihood of a hypothesis (e.g. specific parameter value(s)) on a given dataset, L(hypothesis|data), is defined to be proportional to the probability of the data given the hypothesis, P(data|hypothesis): L(hypothesis|data) = constant * P(data|hypothesis) Because of the proportionality constant, a likelihood by itself has no interpretation. The likelihood ratio (LR) of 2 hypotheses is meaningful if the 2 hypotheses are nested (i.e., one hypothesis is contained within the other): Under certain conditions, maximum likelihood estimates are asymptotically unbiased and asymptotically efficient. Likelihood theory describes how to interpret a likelihood ratio.

Evaluating the evidence of linkage: lod score The lod (logarithm of odds) score is defined as the logarithm (to the base 10) of the likelihood of 2 hypothesis on a given dataset: In linkage analysis, typically the different hypotheses refer to different values of the recombination fraction:

Who is a recombinant? 1/1 a/a 2/2 b/b 1 2 a b 3/3 c/c 1 3 a c 2 3 b c

Example lod score calculation 0.1 0.644 0.2 0.837 0.3 0.725 0.4 0.439 0.5

Missing phase information: Who is a recombinant?? 1 2 a b phase 1: 1 2 b a phase 2: 1/2 3/3 a/b c/c 1 3 a c 2 3 b c 1 3 a c 1 3 b c 1 3 a c 1 3 a c 2 3 b c 2 3 b c 2 3 a c 2 3 b c N R R N

Example lod score calculation (missing phase information) P(data|q) = P(phase 1) P(data|phase 1, q) + P(phase 2) P(data|phase 2 , q) 0.1 0.343 0.2 0.536 0.3 0.427 0.4 0.175 0.5

Missing phase and genotype information: Who is a recombinant??? ?/? ?/? a/b c/c 1 3 a c 2 3 b c 1 3 a c 1 3 b c 1 3 a c 1 3 a c 2 3 b c 2 3 b c 2 3 a c 2 3 b c

Example lod score calculation (missing phase and genotype information) Assuming 3 equally frequent alleles , i.e. P(1) = P(2) = P(3) = 0.333: q Z(q) 0 -0.304 0.1 0.204 0.2 0.346 0.3 0.264 0.4 0.096 0.5 0 Assuming P(1) = 0.495, P(2) = 0.495, P(3) = 0.010: q Z(q) 0 -0.378 0.1 0.183 0.2 0.332 0.3 0.253 0.4 0.091 0.5 0

known phase, known genotypes unknown phase, known genotypes 3 unknown phase, unknown genotypes

Interpretation of lod score The traditional threshold for declaring evidence of linkage statistically significance is a lod score of 3, or a likelihood ratio of 1000:1, meaning the likelihood of linkage on the data is 1000-times higher than the likelihood of no linkage on the data. Asymptotically, a lod score of 3 has a point-wise significance level (p-value) of 0.0001. In other words, the probability of obtaining a lod score of at least this magnitude by chance is 0.0001. Due to the many linkage tests being conducted as part of a genome-wide linkage scan, a lod score of 3 has a significance level of ~0.05.

P-value The p-value is defined as the probability of obtaining an outcome at least as extreme as observed by chance (i.e. when the null hypothesis is true). Example: Testing whether a coin is fair H0: P(head) = 0.5 H1: P(head)  0.5 (2-sided alternative hypothesis). You observe 1 head out of 10 coin tosses. The p-value then is the probability of observing exactly 1 head in 10 trials (observed outcome), or 0 head in 10 trials (more extreme outcome), or 9 (equally extreme outcome) or 10 (more extreme outcome) heads in 10 trials.

P-value The p-value is defined as the probability of obtaining an outcome at least as extreme as observed by chance (i.e. when the null hypothesis is true). Example: Testing whether 2 loci are linked H0: P(recombination) = 0.5 H1: P(recombination) ≤ 0.5 (1-sided alternative hypothesis). You observe 0 recombinant and 10 non-recombinant in 10 informative meioses. The p-value then is the probability of observing exactly 0 recombinants in 10 trials (observed outcome; there is no more extreme outcome).

Lod score Example: Testing whether 2 loci are linked H0: P(recombination) = 0.5 H1: P(recombination) ≤ 0.5 (1-sided alternative hypothesis). You observe 0 recombinant and 10 non-recombinant in 10 informative meioses. The p-value then is the probability of observing exactly 0 recombinants in 10 trials (observed outcome; there is no more extreme outcome). In the ideal case, 10 fully informative meioses may suffice to obtain significant evidence of linkage.

Lod score and significance level (point-wise) p-value 0.588 0.05 1.175 0.01 2.000 ~0.001 3.000 0.0001 4.000 ~0.00001 5.000 ~0.000001

Linkage analysis reduces multiple testing problem Linkage analysis is so useful because it greatly reduces the multiple testing problem: ~3,000,000,000 bp of DNA are interrogated in ~500 independent linkage tests for human data. This is possible because a meiotic recombination event occurs on average only once every 100,000,000 bp. No specification of prior hypotheses is therefore necessary, as all possible hypotheses can be screened.

Linkage analysis: trait locus with unknown genotypes ?/? Where do the observed genotypes “fit”?

Statistical gene mapping with trait phenotypes observed marker genotypes observed trait phenotypes correlation to be detected etiology ? genetic distance (linkage, allelic association) unobserved trait locus genotypes

Many different types of linkage methods penetrance model-based linkage analysis (“classical” linkage analysis) penetrance model-free linkage analysis (“model-free” or “non-parametric” linkage analysis affected sib-pair linkage analysis affected relative-pair linkage analysis regression-based linkage analysis variance components-based linkage analysis …

Variation with each linkage method 2-point analysis vs. multiple 2-point analysis vs. multi-point analysis exact calculation vs. approximation (e.g., MCMC) qualitative trait vs. quantitative traits rare “simple mendelian” diseases vs. common “complex multifactorial diseases” …

Penetrance-model-based linkage analysis

Segregation analysis In segregation analysis, one attempts to characterize the mode of inheritance of a trait, by statistically examining the segregation pattern of the trait through a sample of related individuals. In a way, heritability analysis is a way of segregation analysis. In heritability analysis, the analysis is not focused on characterization of the segregation pattern per se, but on quantification of inheritance assuming a given mode of inheritance (such as, generally, additivity/co-dominance).

penetrance: P(phenotype given genotype) Relationship between genotypes and phenotypes (penetrances) at the ABO blood group locus penetrance: P(phenotype given genotype) Phenotype (blood group) Genotype A B AB O A/A 1 0 0 0 A/B 0 0 1 0 A/O 1 0 0 0 B/B 0 1 0 0 B/O 0 1 0 0 O/O 0 0 0 1

Probability model correlating trait phenotypes and trait locus genotypes: penetrances penetrance: P(phenotype given genotype) Ex.: fully-penetrant dominant disease without “phenocopies” Phenotype Genotype unaffected affected +/+ 1 0 D/+ or +/D 0 1 D/D 0 1

Statistical gene mapping with trait phenotypes: “simple” dominant inheritance model observed marker genotypes observed trait phenotypes correlation to be detected affected not affected D/+ +/+ = genetic distance (linkage, allelic association) unobserved trait locus genotypes CLICK

Linkage analysis: trait locus (genotypes based on assumed dominant inheritance model) +/+ D/+ Where do the observed genotypes “fit”?

Example of multipoint lod score curve: Pseudoxanthoma elasticum From: Le Saux et al (1999) Pseudoxanthoma elasticum maps to an 820 kb region of the p13.1 region of chromosome 16. Genomics 62:1-10

Genetic heterogeneity locus homogeneity, allelic homogeneity time locus homogeneity, allelic heterogeneity locus heterogeneity, allelic homogeneity (at each locus) time locus heterogeneity, allelic heterogeneity (at each locus)

Pros and cons of penetrance-model-based linkage analysis + potentially very powerful (under suitable penetrance model) + statistically well-behaved - requires specification of penetrance model; not powerful at all under unsuitable penetrance model

Effects of model misspecification uninformative informative dominant inheritance: +/+ D/+ 1/2 3/4 P(aff.|DD or D+) = 1 P(aff.|++) = 0 D/+ +/+ D/+ 1/3 1/4 2/3 informative uninformative recessive inheritance: D/+ D/D 1/2 3/4 P(aff.|DD) = 1 P(aff.|++ or D+) = 0 D/D D/+ D/D 1/3 1/4 2/3

Pros and cons of penetrance-model-based linkage analysis + potentially very powerful (under suitable penetrance model) + statistically well-behaved - requires specification of penetrance model; not powerful at all under unsuitable penetrance model - modeling flexibility limited - computationally intensive

“Mendelian” vs. “complex” traits “simple mendelian” disease genotypes of a single locus cause disease often little genetic (locus) heterogeneity (sometimes even little allelic heterogeneity); little interaction between genotypes at different genes often hardly any environmental effects often low prevalence often early onset often clear mode of inheritance “good” pedigrees for gene mapping can often be found often straightforward to map “complex multifactorial” disease genotypes of a single locus merely increase risk of disease genotypes of many different genes (and various environmental factors) jointly and often interactively determine the disease status important environmental factors often high prevalence often late onset no clear mode of inheritance not easy to find “good” pedigrees for gene mapping difficult to map

A quantitative trait is not necessarily complex observed marker genotypes correlation to be detected observed trait phenotypes genetic distance (linkage, allelic association) unobserved trait locus genotypes etiology given ascertainment

Fundamental problem in complex trait gene mapping observed trait phenotypes correlation to be detected etiology given ascertainment observed marker genotypes unobserved trait locus genotypes genetic distance (linkage, allelic association)

Etiological complexity genotype 1 genotype 2 other genotypes genotype 1 genotype 2 other genotypes genotype 1 genotype 2 other genotypes gene 2 gene 1 gene 3 other env. factor(s) trait phenotype other gene(s) environm. factor 1 environm. factor 3 environm. factor 2

observed trait phenotypes How to improve power to detect correlations between trait phenotypes and trait locus genotypes? observed trait phenotypes etiology unobserved trait locus genotypes CLICK 1

How to simplify the etiological architecture? choose tractable trait Are there sub-phenotypes within trait? age of onset severity combination of symptoms (syndrome) “endophenotype” or “biomarker ” vs. disease quantitative vs. qualitative (discrete) Dichotomizing quantitative phenotypes leads to loss of information. simple/cheap measurement vs. uncertain/expensive diagnosis not as clinically relevant, but with simpler etiology given trait, choose appropriate study design/ascertainment protocol study population genetic heterogeneity environmental heterogeneity “random” ascertainment vs. ascertainment based on phenotype of interest single or multiple probands concordant or discordant probands pedigrees with apparent “mendelian” inheritance? inbred pedigrees? data structures singletons, small pedigrees, large pedigrees account for/stratify by known genetic and environmental risk factors

Affected sib-pair linkage analysis

Identity-by-state (IBS) vs. identity-by-descent (IBD) 1 2 3 4 1 3 1 4 1 2 1 3 1 1 2 3 1 3 1 2 1 2 IBD (also IBS) IBS (not IBD) ? ? (both or neither IBD) If IBD then necessarily IBS (assuming absence of mutation event). If IBS then not necessarily IBD (unless a locus is 100% informative, i.e. has an infinite number of alleles, each with infinitesimally small allele frequency).

Probabilistic inference of IBD 1 2 3 4 1 3 1 4 1 2 1 3 1 1 2 3 1 3 1 2 1 2 IBD 1 0 0.5 1 1 2 1.5 1 0.5 0 0.25 0.5 NIBD p

Rationale of affected sib-pair linkage analysis A pair of sibs affected with the same disorder is expected to share the alleles at the trait locus/loci---and also alleles at linked loci---more often (> 50 %) than a random pair of sibs (50 %).

Basic concept of affected sib pair linkage analysis 1/2 3/4 1/3 1/4 IBD? IBD NIBD 1/2 3/4 1/3 1/4 IBD? IBD NIBD

Affected sib pair linkage analysis (mean test) NIBD IBD counts in example ped. 1 total counts in dataset 1/2 3/4 IBD? Conditional on the fact that both sibs are affected, test if: 1/3 1/4 IBD NIBD

Affected sib pair linkage analysis (mean test) NIBD IBD probability counts in ex. 1 total counts IBD? IBD NIBD 1/2 3/4 1/3 1/4

Penetrance-model based linkage analysis on affected sib pair 1/2 3/4 1/3 1/4 Trait locus genotypes are inferred probabilistically conditional on observed phenotypes according to an assumed inheritance model (number of alleles, allele frequencies and genotypic penetrances). ?/?

Penetrance-model-based linkage analysis on affected sib pair 1/2 3/4 1/3 1/4 D/+ D/D assuming a rare recessive trait w/o “phenocopies” Conditional on the fact that both affected sibs inherited the D allele from each parent, test if:

Penetrance-based linkage analysis on affected sib pair (assuming a rare, recessive trait w/o “phenocopies”) and because 1/2 3/4 1/3 1/4 D/+ D/D

Relationship of affected sib-pair linkage analysis and penetrance-model-based linkage analysis For an affected sib-pair of unaffected parents, affected sib-pair linkage analysis and penetrance-model-based linkage analysis assuming a rare recessive trait w/o “phenocopies” are identical.

Penetrance-based linkage analysis on affected sib pair 1/2 3/4 1/3 1/4 D/D D/+ Assuming a rare, recessive trait w/o “phenocopies”, the father is no longer informative. Penetrance-based linkage analysis is then no longer equivalent to affected sib pair linkage analysis.

“pseudo-marker” genotypes “Pseudo-marker” analog of affected sib pair linkage analysis (mean test) 1/2 3/4 1/3 1/4 D/+ D/D 1/2 3/4 1/3 1/4 D/+ D/D “pseudo-marker” genotypes

Take home message regarding relationship of penetrance-model-based and “model-free” approaches to gene mapping: The perceived differences between penetrance-model based and many popular “model-free” methods are more related to the underlying study design than the statistical methodology. A deterministic “pseudo-marker” genotype assignment algorithm can be used to mimic popular “model-free approaches”, allowing joint analysis of different data structures for linkage and/or LD in a framework identical to penetrance-based analysis. These “pseudo-marker” statistics are generally better behaved and more powerful than their conventional “model-free” analogs.

Regression-based methods for linkage analysis of quantitative traits The basic rationale behind this approach (in its various forms) is that pairs of individuals (of a given relationship) with similar phenotypes are expected to be more similar to each other genetically at/near loci influencing the trait of interest than pairs of relatives (of the same relationship) who have dissimilar phenotypes. The degree of phenotypic similarity therefore should be reflected in the proportion of alleles that individuals share IBD at/near trait loci.

Haseman-Elston sib pair linkage test for quantitative traits squared phenotypic difference between 2 sibs Statistical inference: Is the regression slope < 0? D2 * * * * * * * * * * * IBD 0 0.5 1

Variance components-based linkage analysis

Rationale of variance components-based linkage analysis The pattern of phenotypic similarity among pedigree members should be reflected by the pattern of IBD sharing among them at chromosomal loci influencing the trait of interest.

Variance components approach: multivariate normal distribution (MVN) In variance components analysis, the phenotype is generally assumed to follow a multivariate normal distribution: no. of individuals (in a pedigree) nn covariance matrix phenotype vector mean phenotype vector

Modeling the resemblance among relative heritability analysis linkage analysis

Matrix of estimated allele sharing among relatives P M 12 33 S1 S2 S3 13 13 13 P M S1 S2 S3 1 0.5 P M S1 S2 S3 1 0.5 0.75

Variance components-based lod score

Sample size requirements to detect linkage to a QTL with a lod score of ≥ 3 and 80% power

Pros and cons of variance-components-based linkage analysis + no need to specify inheritance model + robust to allelic heterogeneity at a locus + modeling flexibility + computationally feasible even on large pedigrees - generally assumes additive inheritance model - modeling restrictions - not always well-behaved statistically (depending on phenotypic distribution and ascertainment) generally less powerful than penetrance-model-based linkage analysis under suitable model

Choice of covariates Covariates ought to be included in the likelihood model if they are known to influence the phenotype of interest and if their own genetic regulation does not overlap the genetic regulation of the target phenotype. Typical examples include sex and age. In the analysis of height, information on nutrition during childhood should probably be included during analysis. However, known growth hormone levels probably should not be.

Choice of covariates

Choice of covariates

Choice of covariates: special case of treatment/medication

Before treatment/medication of affected individuals unaffected affected

apparent effect of covariate After (partially effective) treatment / medication of affected individuals apparent effect of covariate unaffected affected

Choice of covariates: special case of treatment/medication If medication is ineffective/partially effective, including treatment as a covariate is worse than ignoring it in the analysis. If medication is very effective, such that the phenotypic mean of individuals after treatment is equal to the phenotypic mean of the population as a whole, then including medication as a covariate has no effect. If medication is extremely effective, such that the phenotypic mean of individuals after treatment is “better” than the phenotypic mean of the population as a whole, then including medication as a covariate is better than ignoring it, but still far from satisfying. Either censor individuals or, better, infer or integrate over their phenotypes before treatment, based on information on efficacy etc.

Two-point vs. multi-point linkage analysis In linkage analysis, one always examines whether or not the alleles at 2 loci tend to co-segregate during meiosis. In “two-point” linkage analysis, chromosomal inheritance is inferred from the observed trait phenotypes on the one hand (locus 1) and from a single (genotyped) marker locus on the other hand (locus 2). In “multi-point” linkage analysis, chromosomal inheritance is inferred from the observed trait phenotypes on the one hand (locus 1) and from multiple (genotyped) marker loci on the other hand (locus 2).

Pros and cons of multi-point linkage analysis + Genotypes at multiple markers contain at least as much and generally more information to infer chromosomal inheritance than genotypes at a single marker, resulting in greater power to detect linkage. + The number of independent tests in genome-wide linkage analysis is somewhat reduced in multi-point linkage analysis vs. two-point linkage analysis. - Multi-point linkage analysis requires knowledge of the genetic marker map (marker order and inter-marker recombination fractions). If this information is incorrect, power can be reduced and/or the false positive rate can be increased. - Multi-point linkage analysis is more susceptible to genotyping errors. - Multi-point linkage analysis typically assumes linkage equilibrium between markers. If this does not hold, power can be reduced and/or the false positive rate can be increased. - Multi-point linkage analysis is computationally more demanding than two-point linkage analysis.

Genetic map vs. physical map 12 23 34 genetic map x1 x2 x3 x4 cM physicalmap y1 y2 y3 y4 Mb

Genetic map distance vs. recombination fraction Def. of recombination fraction: probability that recombination takes place between 2 chromosomal positions during meiosis Recombination fractions are not additive, i.e., for 3 loci and recombination fractions 12 and 23, 13 ≠ 12 + 23. Def. of genetic map distance (Morgan, M): distance in which 1 recombination event is expected to take place or, equivalently, average distance between recombination events. centi-Morgan (cM) is equal to 1/100 Morgan. Genetic map distances are additive, i.e. for 3 loci and map distances x12 cM and x23 cM, x13 = x12 + x23 cM. Neither recombination fractions nore genetic map distances are easily converted into physical map distances.

Why a genome-wide linkage scan may fail The sample size is too small. The marker genotypes are not sufficiently informative (low heterozygosity and/or large gaps in marker map). There is no major gene. The chosen analytical approach is unsuitable. Bad luck!

A fairytale of 2 traits

Heritability estimates trait A trait B 45-82% 63-92%

Quantitative trait A (sample 1) large, randomly ascertained pedigrees no. of phenotyped individuals: 268 trait heritability estimate: 0.55

Quantitative trait B (sample 1) large, randomly ascertained pedigrees no. of phenotyped individuals: 324 trait heritability estimate: 0.88

Quantitative trait A (sample 1)

Quantitative trait A (samples 1--2)

Quantitative trait A (samples 1--3)

Quantitative trait A (samples 1--3 + combined)

Quantitative trait B (sample 1)

Quantitative trait B (samples 1--2)

Quantitative trait B (samples 1--3)

Quantitative trait B (samples 1--4)

Quantitative trait B (samples 1--5)

Quantitative trait B (samples 1--6)

Quantitative trait B (samples 1--7)

Quantitative trait B (samples 1--8)

Quantitative trait B (samples 1--9)

quantitative trait A: lipoprotein A (concentration in serum) quantitative trait B: height (in adults)

heritability estimate Heritability of adult height (additive heritability, adjusted for sex and age) study sample size heritability estimate TOPS 2199 0.78 FLS 705 0.83 GAIT 324 0.88 SAFHS 903 0.76 SAFDS 737 0.92 SHFS AZ 643 0.80 DK 675 0.81 OK 647 0.79 Jiri 616 0.63 total 7449

Polygenic or oligogenic ?

Height (9 samples)