Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.

Introduction to Linkage Analysis Pak Sham Twin Workshop 2003

Human Genome 22 autosomes, XY  3  10 9 base-pairs (2 metres long)  2% coding sequences, rest regulatory & “junk”  30,000 - 40,000 genes Much communality with other species

Genetic Variation Chromosomal abnormalities Duplication (e.g. Down’s) Deletion (e.g. Velo-cardio-facial syndrome) Major deleterious mutations Usually Rare (e.g. Huntington’s) Polymorphisms Single nucleotide polymorphisms (SNPs) Variable length repeats (e.g. microsatellites) Some are functional (“normal variation”) Most are non-functional (neutral markers)

Genetic Mapping of Disease Levels of Genetic Analysis Estimate heritability (family, twins, adoption) Find chromosomal locations (linkage) Identify risk variants (association) Understand mechanisms (cell biology, etc) Applications Prediction of genetic risk More accurate prediction of genetic risk Even more accurate prediction of genetic risk; prediction of prognosis and treatment response Development of new drug targets

Strategies of Gene Mapping Functional Uses knowledge of disease to identify candidate genes Finds variants in candidate genes Looks for association between variants and disease Positional Systematic screen of whole genome Uses a set of  400 evenly-spaced markers Looks for markers which con-segregate with disease

Co-segregation A2A4A2A4 A3A4A3A4 A1A3A1A3 A1A2A1A2 A2A3A2A3 A1A2A1A2 A1A4A1A4 A3A4A3A4 A3A2A3A2 Marker allele A 1 cosegregates with dominant disease

Linkage  Co-segregation Parent Gametes Alleles on the same chromosome tend to be stay together in meiosis; therefore they tend be co-transmitted.

Crossing over between homologous chromosomes

Map Distance Map distance between two loci (Morgans) = Expected number of crossovers per meiosis (1 Morgan = 100 centiMorgans) Note: Map distances are additive Heterogeneity in recombination frequencies Total map length:  33 (1 cM  10 6 base pairs)

Recombination A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 A1A1 A2A2 Q1Q1 Q2Q2 Non-recombinants 1-  Recombinants  Parental genotypes

Recombination Fraction Recombination fraction (  ) between two loci = Proportion of gametes that are recombinant with respect to the two loci

Recombination & map distance Haldane map function

Double Backcross : Fully Informative Gametes AaBb aabb AABB aabb AaBbaabb Aabb aaBb Non-recombinantRecombinant

Linkage Analysis : Fully Informative Gametes Count DataRecombinant Gametes: R Non-recombinant Gametes: N ParameterRecombination Fraction:  LikelihoodL(  ) =  R (1-  ) N Estimation Chi-square

Phase Unknown Meioses AaBb aabb AaBbaabb Aabb aaBb Non-recombinantRecombinant Non-recombinant Either : Or :

Mixture distribution likelihood The probability of observed data X depend on the status of descrete variable G P(X|G) The status of G is not observed but the probability distribution of G is available P(G) Then the likelihood of the observed data X is

Linkage Analysis : Phase-unknown Meioses Count DataRecombinant Gametes: X Non-recombinant Gametes: Y orRecombinant Gametes: Y Non-recombinant Gametes: X LikelihoodL(  ) =  X (1-  ) Y +  Y (1-  ) X An example of incomplete data : Mixture distribution likelihood function

Parental genotypes unknown Likelihood will be a function of allele frequencies (population parameters)  (transmission parameter) AaBbaabb Aabb aaBb

Complex Phenotypes Penetrance parameters Genotype Phenotype f2f2 AA aa Aa Disease Normal f1f1 f0f0 1- f 2 1- f 1 1- f 0 Each phenotype is compatible with multiple genotypes.

General Pedigree Likelihood Likelihood is a sum of products (mixture distribution likelihood) number of terms = (m 1 m 2 …..m k ) 2n where m j is number of alleles at locus j

Elston-Stewart algorithm Reduces computations by peeling: Step 1 Condition likelihoods of family 1 on genotype of X. 1 2 X Step 2 Joint likelihood of families 2 and 1

Lod Score: Morton (1955) Lod > 3  conclude linkage Prior odds linkage ratioPosterior odds 1:50100020:1 Lod <-2  exclude linkage

Lod Score Curves lod  0.5 Lod score curves are additive over pedigrees 0

Lods, chi-squares & p-values In large samples 2  log e (10)  Max lod ~  2 1 In small samples P  10 -Max lod

Problems with parametric linkage Requires parameters of the disease model to be specified Allele frequency Penetrances These are generally unknown for a complex trait Disease model assumes that a single locus is the only source of familial resemblance This is generally unrealistic

Linkage Analysis Admixture Test (CAB Smith) Model Probability of linkage in family =  Likelihood L( ,  ) =  L(  ) + (1-  ) L(  =1/2) Note: Another example of mixture likelihood

Linkage Analysis: MOD Maximise lod score over several sets of disease models, e.g. dominant, recessive, additive Make correction for multiple (k) models Adjusted lod = lod – log 10 (k)

Allele sharing (non-parametric) methods Penrose (1935): Sib Pair linkage For rare disease IBD Concordant affected Concordant normal Discordant Therefore affected sib pair (ASP) design efficient Test H 0 : Proportion of alleles IBD =1/2 H A : Proportion of alleles IBD >1/2

Correlation between IBD of two loci For sib pairs Corr(  A,  B ) = (1-2  AB ) 2  attenuation of linkage signal with increasing genetic distance from disease locus

Joint distribution of Pedigree IBD IBD of relative pairs are not independent e.g If IBD(1,2) = 2 and IBD (1,3) = 2 then IBD(2,3) = 2 Inheritance vector gives joint IBD distribution Each element indicates whether paternally inherited allele is transmitted (1) or maternally inherited allele is transmitted (0)  Vector of 2N elements (N = # of non-founders)

Inheritance Vector: An Example 1/23/4 Ordered genotype notation 1 st allele = paternally inherited 2 nd allele = maternally inherited 1/31/4 2/3 2/4 Inheritance vector = (1, 1, 1, 0, 1, 0)

Pedigree allele-sharing methods APM: Affected Pedigree Members: Uses IBS very sensitive to allele frequency mis-specification less powerful than IBD-based methods NPL: Non-Parametric Linkage (Genehunter) Conservative at positions between markers LRT: “Delta parameter” (Genehunter+, Allegro) All these methods consider affected members only

Variance Components Linkage Models trait values of pedigree members jointly Assumes multivariate normality conditional on IBD Covariance between relative pairs = Vr + V Q [  -E(  )] WhereV = trait variance r = correlation (depends on relationship) V Q = QTL additive variance E(  ) = expected proportion IBD

Path Diagram for Sib-Pair QTL model P T1 QS N P T2 QSN 1 [0 / 0.5 / 1] nqsnsq

Incomplete Marker Information IBD sharing cannot always be deduced from marker genotypes with certainty Obtain probabilities of IBD values (Z 0, Z 1, Z 2 ) Finite mixture likelihood Pi-hat likelihood

P T1 QS N P T2 QSN 1 nqsnsq Pi-hat Model

Parametric / Allele Sharing Trait DataMarker Data IBD sharing Parametric Allele sharing

Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.

Similar presentations

Presentation on theme: "Introduction to Linkage Analysis Pak Sham Twin Workshop 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.

Similar presentations

Presentation on theme: "Introduction to Linkage Analysis Pak Sham Twin Workshop 2003."— Presentation transcript:

Similar presentations

About project

Feedback