Genetic principles for linkage and association analyses Manuel Ferreira & Pak Sham Boulder, 2009.

Slides:



Advertisements
Similar presentations
Introduction to QTL mapping
Advertisements

Mendel’s Laws.
The next generation Chapters 9, 10, 17 in the course textbook, especially pages , ,
Qualitative and Quantitative traits
Chapter 6: Quantitative traits, breeding value and heritability Quantitative traits Phenotypic and genotypic values Breeding value Dominance deviation.
Genetics notes For makeup. A gene is a piece of DNA that directs a cell to make a certain protein. –Homozygous describes two alleles that are the same.
1 Statistical Considerations for Population-Based Studies in Cancer I Special Topic: Statistical analyses of twin and family data Kim-Anh Do, Ph.D. Associate.
Practical H:\ferreira\biometric\sgene.exe. Practical Aim Visualize graphically how allele frequencies, genetic effects, dominance, etc, influence trait.
Intro to Quantitative Genetics HGEN502, 2011 Hermine H. Maes.
Chapter 11 Mendel & The Gene Idea.
Gregory Johann Mendel Austrian monk Experimented with pea plants He thought that ‘heritable factors’ (genes) retained their individuality generation.
Outline: 1) Basics 2) Means and Values (Ch 7): summary 3) Variance (Ch 8): summary 4) Resemblance between relatives 5) Homework (8.3)
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Linkage analysis: basic principles Manuel Ferreira & Pak Sham Boulder Advanced Course 2005.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Genetics.
31 January, 2 February, 2005 Chapter 6 Genetic Recombination in Eukaryotes Linkage and genetic diversity.
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Quantitative Genetics
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Introduction to Linkage
Unit Animal Science. Problem Area Animal Genetics and Biotechnology.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Biometrical genetics Pak C Sham The University of Hong Kong Manuel AR Ferreira Queensland Institute for Medical Research 23 rd International Workshop on.
Reminder - Means, Variances and Covariances. Covariance Algebra.
Biometrical Genetics Pak Sham & Shaun Purcell Twin Workshop, March 2002.
Linkage Analysis in Merlin
 Genetic material found in the nucleus of eukaryotic cells; in the cytoplasm of prokaryotes (no nucleus)  A library of genetic information (genes) located.
Stern - Introductory Plant Biology: 9th Ed. - All Rights Reserved - McGraw Hill Companies Genetics Chapter 13 Copyright © McGraw-Hill Companies Permission.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Unit 4 Vocabulary Review. Nucleic Acids Organic molecules that serve as the blueprint for proteins and, through the action of proteins, for all cellular.
Broad-Sense Heritability Index
A gene is composed of strings of bases (A,G, C, T) held together by a sugar phosphate backbone. Reminder - nucleotides are the building blocks.
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Hunting: Linkage and Association
PBG 650 Advanced Plant Breeding
Values & means (Falconer & Mackay: chapter 7) Sanja Franic VU University Amsterdam 2011.
Quantitative Genetics
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
BSAA CD UNIT C Animal Science. Problem Area 1 Animal Genetics and Biotechnology.
Introduction to Genetic Theory
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
 Structural genes: genes that contain the information to make a protein.  Regulatory genes: guide the expression of structural genes, without coding.
Biometrical genetics Manuel AR Ferreira Boulder, 2008 Massachusetts General Hospital Harvard Medical School Boston.
Genetics: Inheritance. Meiosis: Summary  Diploid Cells (2n): Cells with two sets of chromosomes, (aka “homologous chromosomes”)  One set of chromosomes.
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
A Brief History What is molecular biology?
Biometrical model and introduction to genetic analysis
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Chapter 7 Multifactorial Traits
Exercise: Effect of the IL6R gene on IL-6R concentration
Genetics: Inheritance
Unit 5: Heredity Review Lessons 1, 3, 4 & 5.
Introduction to Biometrical Genetics {in the classical twin design}
Chapter 11: Introduction to Genetics Mendel and Meiosis
Presentation transcript:

Genetic principles for linkage and association analyses Manuel Ferreira & Pak Sham Boulder, 2009

Gene mapping LOCALIZE and then IDENTIFY a locus that regulates a trait Linkage analysis Association analysis

If a locus regulates a trait, Trait Variance and Covariance between individuals will be influenced by this locus. Linkage: Association: If a locus regulates a trait, Trait Mean in the population will also be influenced by this locus.

Revisit common genetic parameters - such as allele frequencies, genetic effects, dominance, variance components, etc Use these parameters to construct a biometric genetic model Model that expresses the: (1)Mean (2)Variance (3)Covariance between individuals for a quantitative phenotype as a function of the genetic parameters of a given locus. See how the biometric model provides a useful framework for linkage and association methods.

Outline 1. Genetic concepts 2. Very basic statistical concepts 3. Biometrical model 4. Introduction to linkage analysis

1. Genetic concepts

B. Population level C. Transmission level D. Phenotype level G G G G G G G G G G G G G G G G G G G G G G GG PP Allele and genotype frequencies Mendelian segregation Genetic relatedness Biometrical model Additive and dominance components A. DNA level DNA structure, organization recombination

A DNA molecule is a linear backbone of alternating sugar residues and phosphate groups Attached to carbon atom 1’ of each sugar is a nitrogenous base: A, C, G or T Two DNA molecules are held together in anti-parallel fashion by hydrogen bonds between bases [Watson-Crick rules] Antiparallel double helix Only one strand is read during gene transcription Nucleotide: 1 phosphate group + 1 sugar + 1 base A. DNA level A gene is a segment of DNA which is transcribed to give a protein or RNA product

DNA polymorphisms A B Microsatellites >100,000 Many alleles, eg. (CA) n repeats, very informative SNPs 14,708,752 (build 129, 03 Mar ‘09) Most with 2 alleles (up to 4), not very informative Copy Number Variants >5,000 Many alleles, just recently automated

Haploid gametes ♁ ♂ ♂ ♁ G 1 phase chr1 S phase Diploid zygote 1 cell M phase Diploid zygote >1 cell ♁ ♂ ♁ A - B - A - B - A - B - A - B - A - B - A - B - ♂ ♁ A - B - A - B - ♂ ♁ - A - B - A - B - A - B - A - B DNA organization Mitosis (22 + 1)

Diploid gamete precursor cell (♂)(♁) (♂) (♁) Haploid gamete precursors Hap. gametes NR R R ♁ A - B - - A - B A - B - - A - B A - B - - A - B A - B - - A - B ♂ ♁ A - B - A - B - - A - B - A - B DNA recombination Meiosis 2 (22 + 1) chr1

Diploid gamete precursor (♂)(♁) (♂) (♁) Haploid gamete precursors Hap. gametes NR ♁ A - B - - A - B A - B - - A - B A - B - - A - B A - B - - A - B ♂ ♁ A - B - A - B - - A - B - A - B DNA recombination between linked loci Meiosis 2 (22 + 1)

B. Population level 1. Allele frequencies A single locus, with two alleles - Biallelic - Single nucleotide polymorphism, SNP Alleles A and a - Frequency of A is p - Frequency of a is q = 1 – p A a A genotype is the combination of the two alleles A A A A A A A A A A A A A A a a a a a a a a a a a A a Aa

B. Population level 2. Genotype frequencies (Random mating) A (p)a (q) A (p) a (q) Allele 1 Allele 2 AA (p 2 ) aA (qp) Aa (pq) aa (q 2 ) Hardy-Weinberg Equilibrium frequencies P (AA) = p 2 P (Aa) = 2pq P (aa) = q 2 p 2 + 2pq + q 2 = 1

Segregation (Meiosis) Mendel’s law of segregation A3 (½)A3 (½)A4 (½)A4 (½) A1 (½)A1 (½) A2 (½)A2 (½) Mother (A 3 A 4 ) A1A3 (¼)A1A3 (¼) A2A3 (¼)A2A3 (¼) A1A4 (¼)A1A4 (¼) A2A4 (¼)A2A4 (¼) Gametes Father (A 1 A 2 ) C. Transmission level

D. Phenotype level 1. Classical Mendelian traits Dominant trait - AA, Aa1 - aa0 Recessive trait - AA1 - aa, Aa0 Cystic fibrosis 3 bp deletion exon 10 CFTR gene Huntington’s disease (CAG)n repeat, huntingtin gene

2. Quantitative traits AA Aa aa D. Phenotype level e.g. cholesterol levels

P(X)P(X) X AA Aa aa AA Aaaa D. Phenotype level m

m d +a P(X)P(X) X AA Aa aa – a AA Aaaa Biometric Model Genotypic effect D. Phenotype level

2. Very basic statistical concepts

Mean, variance, covariance 1. Mean (X)

2. Variance (X) Mean, variance, covariance

3. Covariance (X,Y) Mean, variance, covariance

3. Biometrical model

Biometrical model for single biallelic QTL Biallelic locus - Genotypes: AA, Aa, aa - Genotype frequencies: p 2, 2pq, q 2 Alleles at this locus are transmitted from P-O according to Mendel’s law of segregation Genotypes for this locus influence the expression of a quantitative trait X (i.e. locus is a QTL) Biometrical genetic model that estimates the contribution of this QTL towards the (1) Mean, (2) Variance and (3) Covariance between individuals for this quantitative trait X

Biometrical model for single biallelic QTL 1. Contribution of the QTL to the Mean (X) aa Aa AA Genotypes Frequencies, f(x) Effect, x p2p2 2pqq2q2 a d-a = a(p 2 ) + d(2pq) – a(q 2 )Mean (X)= a(p-q) + 2pqd e.g. cholesterol levels in the population

Biometrical model for single biallelic QTL 2. Contribution of the QTL to the Variance (X) aa Aa AA Genotypes Frequencies, f(x) Effect, x p2p2 2pqq2q2 a d-a = (a-m) 2 p 2 + (d-m) 2 2pq + (-a-m) 2 q 2 Var (X) = V QTL Heritability of X at this locus = V QTL / V Total

Biometrical model for single biallelic QTL = (a-m) 2 p 2 + (d-m) 2 2pq + (-a-m) 2 q 2 Var (X) = 2pq[a+(q-p)d] 2 + (2pqd) 2 = V A QTL + V D QTL Additive effects: the main effects of individual alleles Dominance effects: represent the interaction between alleles m = a(p-q) + 2pqd

Biometrical model for single biallelic QTL m = 0 d+a – a A aAaAa d = 0 (no dominance) +a+a+a+a Additive model

Biometrical model for single biallelic QTL m = 0 d+a – a A aAaAa d > 0 (dominance) +a+d+a-d Dominant model

Biometrical model for single biallelic QTL m = 0 d+a – a A aAaAa d < 0 (dominance) +a-d+a+d Recessive model

aa Aa AA aa Aa AA log (x) No departure from additivity Significant departure from additivity Statistical definition of dominance is scale dependent

Biometrical model for single biallelic QTL Var (X) = Regression Variance + Residual Variance -a 0 a aa Aa AAaa Aa AAaa Aa AA aa Aa AA AdditiveDominant Recessive Genotypic mean = Additive Variance + Dominance Variance = V A QTL + V D QTL

Practical H:\manuel\biometric\sgene.exe

Practical Aim Visualize graphically how allele frequencies, genetic effects, dominance, etc, influence trait mean and variance Ex1 a=0, d=0, p=0.4, Residual Variance = 0.04, Scale = 2. Vary a from 0 to 1. Ex2 a=1, d=0, p=0.4, Residual Variance = 0.04, Scale = 2. Vary d from -1 to 1. Ex3 a=1, d=0, p=0.4, Residual Variance = 0.04, Scale = 2. Vary p from 0 to 1. Look at scatter-plot, histogram and variance components.

Some conclusions 1.Additive genetic variance depends on allele frequency p & additive genetic value a as well as dominance deviation d 2.Additive genetic variance typically greater than dominance variance

Biometrical model for single biallelic QTL 3. Contribution of the QTL to the Covariance (X,Y) 2. Contribution of the QTL to the Variance (X) 1. Contribution of the QTL to the Mean (X)

Biometrical model for single biallelic QTL AA Aa aa AA Aaaa (a-m)(a-m) (d-m)(d-m) (-a-m) (a-m)(a-m) (d-m)(d-m) (a-m)2(a-m)2 (a-m)(a-m) (d-m)(d-m) (a-m)(a-m) (d-m)2(d-m)2 (d-m)(d-m) (-a-m) 2 3. Contribution of the QTL to the Cov (X,Y)

Biometrical model for single biallelic QTL AA Aa aa AA Aaaa (a-m)(a-m) (d-m)(d-m) (-a-m) (a-m)(a-m) (d-m)(d-m) (a-m)2(a-m)2 (a-m)(a-m) (d-m)(d-m) (a-m)(a-m) (d-m)2(d-m)2 (d-m)(d-m) (-a-m) 2 p2p pq 0 q2q2 3A. Contribution of the QTL to the Cov (X,Y) – MZ twins = (a-m) 2 p 2 + (d-m) 2 2pq + (-a-m) 2 q 2 Cov(X,Y) = V A QTL + V D QTL = 2pq[a+(q-p)d] 2 + (2pqd) 2

Biometrical model for single biallelic QTL AA Aa aa AA Aaaa (a-m)(a-m) (d-m)(d-m) (-a-m) (a-m)(a-m) (d-m)(d-m) (a-m)2(a-m)2 (a-m)(a-m) (d-m)(d-m) (a-m)(a-m) (d-m)2(d-m)2 (d-m)(d-m) (-a-m) 2 p3p3 p2qp2q 0 pq pq 2 q3q3 3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring

e.g. given an AA father, an AA offspring can come from either AA x AA or AA x Aa parental mating types AA x AA will occur p 2 × p 2 = p 4 and have AA offspring Prob()=1 AA x Aa will occur p 2 × 2pq = 2p 3 q and have AA offspring Prob()=0.5 and have Aa offspring Prob()=0.5 Therefore, P(AA father & AA offspring) = p 4 + p 3 q = p 3 (p+q) = p 3

Biometrical model for single biallelic QTL AA Aa aa AA Aaaa (a-m)(a-m) (d-m)(d-m) (-a-m) (a-m)(a-m) (d-m)(d-m) (a-m)2(a-m)2 (a-m)(a-m) (d-m)(d-m) (a-m)(a-m) (d-m)2(d-m)2 (d-m)(d-m) (-a-m) 2 p3p3 p2qp2q 0 pq pq 2 q3q3 = (a-m) 2 p 3 + … + (-a-m) 2 q 3 Cov (X,Y) = ½V A QTL = pq[a+(q-p)d] 2 3B. Contribution of the QTL to the Cov (X,Y) – Parent-Offspring

Biometrical model for single biallelic QTL AA Aa aa AA Aaaa (a-m)(a-m) (d-m)(d-m) (-a-m) (a-m)(a-m) (d-m)(d-m) (a-m)2(a-m)2 (a-m)(a-m) (d-m)(d-m) (a-m)(a-m) (d-m)2(d-m)2 (d-m)(d-m) (-a-m) 2 p4p4 2p 3 q p2q2p2q2 4p 2 q 2 2pq 3 q4q4 = (a-m) 2 p 4 + … + (-a-m) 2 q 4 Cov (X,Y) = 0 3C. Contribution of the QTL to the Cov (X,Y) – Unrelated individuals

Biometrical model for single biallelic QTL Cov (X,Y) 3D. Contribution of the QTL to the Cov (X,Y) – DZ twins and full sibs ¼ genome ¼ (2 alleles) + ½ (1 allele) + ¼ (0 alleles) MZ twins P-O Unrelateds ¼ genome # identical alleles inherited from parents 0 1 (mother) 1 (father) 2 = ¼ Cov(MZ) + ½ Cov(P-O) + ¼ Cov(Unrel) = ¼(V A QTL +V D QTL ) + ½ (½ V A QTL ) + ¼ (0) = ½ V A QTL + ¼V D QTL

Summary so far…

Biometrical model predicts contribution of a QTL to the mean, variance and covariances of a trait Var (X) = V A QTL + V D QTL Cov (MZ) = V A QTL + V D QTL Cov (DZ) = ½V A QTL + ¼V D QTL On average! 0 or 10, 1/2 or 1 IBD estimation / Linkage Mean (X) = a(p-q) + 2pqd Association analysis Linkage analysis For a given locus, do two sibs have 0, 1 or 2 alleles in common?

4. Introduction to Linkage analysis

For a heritable trait... localize region of the genome where a QTL that regulates the trait is likely to be harboured identify a QTL that regulates the trait Linkage: Association: Family-specific phenomenon: Affected individuals in a family share the same ancestral predisposing DNA segment at a given QTL Population-specific phenomenon: Affected individuals in a population share the same predisposing DNA segment at a given QTL Can only detect very large effects Can detect weaker effects

No Linkage No Association Linkage No Association Linkage Association Families Cases Controls

Non-parametric linkage approach If a trait locus truly regulates the expression of a phenotype, then two relatives with similar phenotypes should have inherited from a common ancestor the same predisposing allele at a marker near the trait locus, and vice-versa. Interest: correlation between phenotypic similarity and genetic similarity at a locus Linkage tests co-segregation between a marker and a trait

Phenotypic similarity between relatives Squared trait differences Squared trait sums Trait cross-product Trait variance-covariance matrix Affection concordance T2T2 T1T1

Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same DNA sequence but they are not necessarily derived from a known common ancestor IBD Alleles shared Identical By Descent are a copy of the same ancestor allele M1M1 Q1Q1 M2M2 Q2Q2 M3M3 Q3Q3 M3M3 Q4Q4 M1M1 Q1Q1 M3M3 Q3Q3 M1M1 Q1Q1 M3M3 Q4Q4 M1M1 Q1Q1 M2M2 Q2Q2 M3M3 Q3Q3 M3M3 Q4Q4 IBS IBD 2 1 Inheritance vector (M)

Genotypic similarity between relatives - M1M1 Q1Q1 M3M3 Q3Q3 M2M2 Q2Q2 M3M3 Q4Q4 Number of alleles shared IBD 0 M1M1 Q1Q1 M3M3 Q3Q3 M1M1 Q1Q1 M3M3 Q4Q4 1 M1M1 Q1Q1 M3M3 Q3Q3 M1M1 Q1Q1 M3M3 Q3Q3 2 Proportion of alleles shared IBD Inheritance vector (M)

Genotypic similarity between relatives - 2 2n ABC D

Var (X) = V A QTL + V D QTL Cov (MZ) = V A QTL + V D QTL Cov (DZ) = ½V A QTL + ¼V D QTL On average! Cov (DZ) = V A QTL + V D QTL For a given locus Cov (DZ) = V A QTL

Statistics that incorporate both phenotypic and genotypic similarities to test V QTL Regression-based methods Variance components methods Mx, MERLIN, SOLAR, GENEHUNTER Haseman-Elston, MERLIN-regress (X 1 -X 2 ) 2 = -2 * V A QTL

Should we still use linkage analysis? Given dense SNP data Rare genetic variant (not covered by the genotyping platform)... or allelic heterogeneity (multiple disease variants in the same gene) *AND* strong effect on phenotype... Linkage analysis can complement association and provide an additional approach to localise a disease locus (with no loss).