Introduction to some basic concepts in quantitative genetics Course “Study Design and Data Analysis for Genetic Studies”, Universidad ded Zulia, Maracaibo,

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Tutorial #1 by Ma’ayan Fishelson
1 Statistical Considerations for Population-Based Studies in Cancer I Special Topic: Statistical analyses of twin and family data Kim-Anh Do, Ph.D. Associate.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
1 15 The Genetic Basis of Complex Inheritance. 2 Multifactorial Traits Multifactorial traits are determined by multiple genetic and environmental factors.
The Inheritance of Complex Traits
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
THE RELATIONSHIP BETWEEN GENOTYPE AND PHENOTYPE – WHAT WE KNOW AND WHAT WE DON’T KNOW. PAUL SCHLIEKELMAN DEPARTMENT OF STATISTICS UNIVERSITY OF GEORGIA.
The Inheritance of Complex Traits
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Quantitative Genetics
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Quantitative genetics
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Quantitative Genetics
Beginnings PART 2 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Chapter 7 Multifactorial Traits
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
ConceptS and Connections
Broad-Sense Heritability Index
Multifactorial Traits
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
Introduction to linkage analysis
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Karri Silventoinen University of Helsinki Osaka University.
Quantitative Genetics
1 Phenotypic Variation Variation of a trait can be separated into genetic and environmental components Genotypic variance  g 2 = variation in phenotype.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Trait evolution Up until now, we focused on microevolution – the forces that change allele and genotype frequencies in a population This portion of the.
Genetic Analysis of Human Diseases Chapter Overview Due to thousands of human diseases having an underlying genetic basis, human genetic analysis.
Presented by Alicia Naegle Twin Studies. Important Vocabulary Monozygotic Twins (MZ)- who are identical twins Dizygotic Twins (DZ)- who are twins that.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
INTRODUCTION TO ASSOCIATION MAPPING
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
HUMAN VARIATION. How do we measure and classify human variation in order to study it?
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
From: Scheinfeld A (1965) Your heredity and environment. JB Lippincott Company, Philadelphia Phenotypic variation among humans is enormous.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Restriction Fragment Length Polymorphism. Definition The variation in the length of DNA fragments produced by a restriction endonuclease that cuts at.
Quantitative genetics
Patterns of single gene inheritance Mahmoud A. Alfaqih BDS PhD Jordan University of Science and Technology School of Medicine Department of Biochemistry.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Extended Pedigrees HGEN619 class 2007.
Genetics: Analysis and Principles
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
Schizophrenia Family studies, show that the more closely related an individual to a schizophrenic patient the higher the risk to develop it . Kendler.
15 The Genetic Basis of Complex Inheritance
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Genetics of qualitative and quantitative phenotypes
Different mode and types of inheritance
Schizophrenia Family studies, show that the more closely related an individual to a schizophrenic patient the higher the risk to develop it . Kendler.
Chapter 7 Multifactorial Traits
What are BLUP? and why they are useful?
Genetics of Quantitative Traits
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Presentation transcript:

Introduction to some basic concepts in quantitative genetics Course “Study Design and Data Analysis for Genetic Studies”, Universidad ded Zulia, Maracaibo, Venezuela, 6 April 2005 Harald H.H. Göring

100%genetic contribution 0% 0%environmental contribution 100% “Nature vs. nurture” trait genes environment “mendelian” traits infections, accidental injuries “complex” traits

“Marker” loci There are many different types of polymorphisms, e.g.: single nucleotide polymorphism (SNP): AAACATAGACCGGTT AAACATAGCCCGGTT microsatellite/variable number of tandem repeat (VNTR): AAACATAGCACACA----CCGGTT AAACATAGCACACACACCGGTT insertion/deletion (indel): AAACATAGACCACCGGTT AAACATAG CCGGTT restriction fragment length polymorphism (RFLP) …

Genetic variation in numbers There are ~6 x 10 9 humans on earth, and thus ~12 x 10 9 copies of each autosomal chromosome. Assuming a mutation rate of ~1 x 10 8, every single nucleotide will be mutated (~12 x 10 9 ) / (~1 x 10 8 ) = ~120 in each new generation of earthlings. Thus, every nucleotide will be polymorphic in Homo sapiens, except for those where variation is incompatible with life. Any 2 chromosomes differ from each other every ~1,000 bp. The 2 chromosomal sets inherited from the mother and the father (each with a length of 3 x 10 9 bp) therefore differ from each other at ~3 x 10 9 / ~1,000 = ~ 3 x 10 6, or ~3 million, locations.

locus:a position in the DNA sequence, defined relative to others; in different contexts, this might mean a specific polymorphism or a very large region of DNA sequence in which a gene might be located gene:the sum total of the DNA sequence in a given region related to transcription of a given RNA, including introns, exons, and regulatory regions polymorphism:the existence of 2 or more variants of some locus allelethe variant forms of either a gene or a polymorphism neutral allele:any allele which has no effect on reproductive fitness; a neutral allele could affect a phenotype, as long as the phenotype itself has no effect on fitness silent allele:any allele which has no effect on the phenotype under study; a silent allele can affect other phenotype(s) and reproductive fitness disease-predisposing allele:any allele which increases susceptibility to a given disease; this should not be called a mutation mutation:the process by which the DNA sequence is altered, resulting in a different allele Definitions of some important terms

Genetics vs. epidemiology: aggregate effects The sharing of environmental factors among related (as well as unrelated) individuals is hard to quantify as an aggregate. In contrast, the sharing of genetic factors among related (as well as unrelated) individuals is easy to quantify, because inheritance of genetic material follows very simple rules. Aggregate sharing of genetic material can therefore be predicted fairly accurately w/o measurements: e.g. –a parent and his/her child share exactly 50% of their genetic material (autosomal DNA) –siblings share on average 50% of their genetic material –a grandparent and his/her grandchild (or half-sibs or avuncular individuals) share on average 25% of their genetic material genome as aggregate “exposure”: While it is not clear whether an individual has been “exposed” to good or bad factors, “co-exposure” among relatives is predictable.

Use of genetic similarity of relatives The genetic similarity of relatives, a result of inheritance of copies of the same DNA from a common ancestor, is the basis for –heritability analysis –segregation analysis –linkage analysis –linkage disequilibrium analysis –relationship inference between close relatives (e.g., identification of human remains, paternity disputes) between distant groups of individuals from the same species (e.g., analysis of migration pattern) between different species (e.g., analysis of phylogenetic trees) –identification of conserved DNA sequences through sequence alignment –…

Relatives are not i.i.d. Unlike many random variables in many areas of statistics, the phenotypes and genotypes of related individuals are not independent and identically distributed (i.i.d.). Many standard statistical tests can and/or should therefore not be applied in the analysis of relatives. Most analyses on related individuals use likelihood- based statistical approaches, due to the modeling flexibility of this very general statistical framework.

“Mendelian” vs. “complex” traits “simple mendelian” disease genotypes of a single locus cause disease often little genetic (locus) heterogeneity (sometimes even little allelic heterogeneity); little interaction between genotypes at different genes often hardly any environmental effects often low prevalence often early onset often clear mode of inheritance “good” pedigrees for gene mapping can often be found often straightforward to map “complex multifactorial” disease genotypes of a single locus merely increase risk of disease genotypes of many different genes (and various environmental factors) jointly and often interactively determine the disease status important environmental factors often high prevalence often late onset no clear mode of inheritance not easy to find “good” pedigrees for gene mapping difficult to map

Genetic heterogeneity time locus homogeneity, allelic homogeneity locus homogeneity, allelic heterogeneity locus heterogeneity, allelic homogeneity (at each locus) time locus heterogeneity, allelic heterogeneity (at each locus)

Study design different traits different study designs different analytical methods

How to simplify the etiological architecture? choose tractable trait –Are there sub-phenotypes within trait? age of onset severity combination of symptoms (syndrome) –“endophenotype” or “biomarker ” vs. disease quantitative vs. qualitative (discrete) Dichotomizing quantitative phenotypes leads to loss of information. simple/cheap measurement vs. uncertain/expensive diagnosis not as clinically relevant, but with simpler etiology given trait, choose appropriate study design/ascertainment protocol –study population genetic heterogeneity environmental heterogeneity –“random” ascertainment vs. ascertainment based on phenotype of interest single or multiple probands concordant or discordant probands pedigrees with apparent “mendelian” inheritance? inbred pedigrees? –data structures singletons, small pedigrees, large pedigrees –account for/stratify by known genetic and environmental risk factors

Qualitative and quantitative traits qualitative or discrete traits: –disease (often dichotomous; assessed by diagnosis): Huntington’s disease, obesity, hypertension, … –serological status (seropositive or seronegative) –Drosophila melanogaster bristle number quantitative or continuous traits: –height, weight, body mass index, blood pressure, … –assessed by measurement

discrete trait (e.g. hypertension) continuous trait (e.g. blood pressure) 01

Why use a quantitative trait? Why not?

Pros and cons of disease vs. quantitative trait disease for rare disease, limited variation in random sample; need for non- random ascertainment for late-onset diseases, it is difficult/impossible to find multigenerational pedigrees diagnosis: often difficult, subjective, arbitrary treatment may cure disease or weaken symptoms, but original disease status is generally still known of great clinical interest often more complex etiologically continuous trait sufficient variation in random sample; non-random ascertainment may not be necessary or advisable as no special ascertainment is necessary, any pedigree is suitable measurement: often straight-forward, reliable medications and other covariates may influence phenotype often only of limited/indirect clinical interest often simpler etiologically

Dichotomizing quantitative phenotypes generally leads to loss of information unaffectedaffected

Characterization of a quantitative trait center of distribution spread around center symmetry thickness of tails

How can a continuous trait result from discrete genetic variation? Suppose 4 genes influence the trait, each with 2 equally frequent alleles. Assume that at each locus allele 1 decreases the phenotype of an individual by 1, and that allele 2 increases the phenotype by 1. Now, let us obtain a random sample from the population - by coin tossing. Take 2 coins and toss them. 2 tails mean genotype 11, and phenotype of heads mean genotype 22, and phenotype contribution of head and 1 head is a heterozygote (genotype 12), with phenotype of 0. Repeat this experiment 4 times (once for each locus). Sum up the results to obtain the overall phenotype.

Variance decomposition phenotypic variance due to all causes phenotypic variance due to genetic variation phenotypic variance due to environmental variation

phenotypic variance due to genetic variation phenotypic variance due to additive effects of genetic variation phenotypic variance due to dominant effects of genetic variation Decomposition of phenotypic variance attributable to genetic variation

-a0 phenotypic means of genotypes +a AAABBB d

-a d=0 phenotypic means of genotypes +a AAABBB If the phenotypic mean of the heterozygote is half way between the two homozygotes, there is “dose- response” effect, i.e. each dose of allele B increases the phenotype by the same amount. In this case, d = 0, and there is no dominance (interaction between alleles at the same polymorphism).

phenotypic variance due to environmental variation phenotypic variance due to environmental variation common among individuals (e.g., culture, household) phenotypic variance due to environmental variation unique to an individual Decomposition of phenotypic variance attributable to environmental variation

The proportion of the phenotypic variance in a trait that is attributable to the effects of genetic variation. Definition of heritability The absolute values of variance attributable to a specific factor are not important, as they depend on the scale of the phenotype. It is the relative values of variance matter.

The proportion of the phenotypic variance in a trait that is attributable to: Broad sense and narrow-sense heritability - effects of genetic variation (broad sense) - additive effects of genetic variation (narrow sense)

100%genetic contribution 0% 0%environmental contribution 100% “Nature vs. nurture” trait genes environment

Different degrees of relationship have different phenotypic covariance/correlation relative pair phenotypic covariance phenotypic correlation parent child full sibs half sibs first cousins (assuming absence of effect of shared environment)

MZ and DZ twins have different phenotypic covariance/correlation relative pair phenotypic covariance phenotypic correlation identical twins fraternal twins 2x difference (assuming equal effect of shared environment)

Normal distribution x f(x)

Variance components approach: multivariate normal distribution (MVN) In variance components analysis, the phenotype is generally assumed to follow a multivariate normal distribution: no. of individuals (in a pedigree) n  n covariance matrix phenotype vector mean phenotype vector

Variance-covariance matrix The variance-covariance matrix describes the phenotypic covariance among pedigree members. n  n structuring matrix scalar variance component (random effect)

“Sporadic” model: no phenotypic resemblance between relatives In the simplest model, the phenotypic covariance among pedigree members is only influenced by environmental exposure unique to each individual. Shared factors among relatives, such as genetic and environmental factors, do not influence the trait. identity matrix:

Identity matrix f m 32 1 fm123 f10000 m

Modeling phenotypic resemblance between relatives: “polygenic” model kinship matrix

Kinship and relationship matrix kinship matrix: Each element in the kinship matrix contains probability that the allele at a locus randomly drawn from the 2 chromosomal sets in a person is a copy of the same allele at the same locus randomly drawn from the 2 chromosomal sets in another person. For one individual,  = 0.5, assuming absence of inbreeding. relationship matrix: This provides the probability that a given locus is shared identical-by-descent among 2 individuals. This is equivalent to the expected proportion of the genome that 2 individuals share in common due to common ancestry. For one individual, 2  = 1, assuming absence of inbreeding.

Relationship matrix and  7 matrix relationship self11 MZ twin pair11 DZ twin pair full sibs half sibs0.250 grandparent - grandchild avuncular0.250 first cousin1/80 second cousin1/320

Relationship matrix: nuclear family f m 32 1 fm123 f100.5 m

Relationship matrix: half-sibs f1m 21 mf212 f m010 f f2

The likelihood of a hypothesis (e.g. specific parameter value(s)) on a given dataset, L(hypothesis|data), is defined to be proportional to the probability of the data given the hypothesis, P(data|hypothesis): L(hypothesis|data) = constant * P(data|hypothesis) Because of the proportionality constant, a likelihood by itself has no interpretation. The likelihood ratio (LR) of 2 hypotheses is meaningful if the 2 hypotheses are nested (i.e., one hypothesis is contained within the other): Under certain conditions, maximum likelihood estimates are asymptotically unbiased and asymptotically efficient. Likelihood theory describes how to interpret a likelihood ratio. Likelihood

Inference in heritability analysis H 0 : (Additive) genetic variation does not contribute to phenotypic variation H 1 : (Additive) genetic variation does contribute to phenotypic variation heritability:

Modeling phenotypic resemblance between relatives: “polygenic” model allowing for dominance matrix of probabilities that 2 individuals inherited the same alleles on both chromosomes from 2 common ancestors

relationship self11 MZ twin pair11 DZ twin pair full sibs half sibs0.250 grandparent - grandchild avuncular0.250 first cousin1/80 second cousin1/320 Relationship matrix and  7 matrix

 7 matrix: nuclear family f m 32 1 fm123 f10000 m

Inference in heritability analysis H 0 : (Additive) genetic variation does not contribute to phenotypic variation H 1 : (Additive) genetic variation does contribute to phenotypic variation 2 degrees of freedom

Is it reasonable to assume that the only source for phenotypic resemblance among relatives is genetic? No. To overcome this problem, one can try to model shared environment, either in aggregate or broken into specific environmental factors. household matrix: accounts for aggregate of environmental factors shared among individuals living in the same household

Household matrix f m 32 1 fm123 f11100 m

“Household” effect

Nested models for heritability analysis model “sporadic”+-- “household”++- “additive polygenic”+-+ “general”+++ non-nested hypotheses

Inclusion of covariates Measured covariates can easily be incorporated as “fixed effects” in the multivariate normal model of the phenotype, by making the expected phenotype different for different individuals as a function of the measured covariates.

Inclusion of covariates If covariates are not of interest in and of themselves, one can “regress them out” before pedigree analysis. Then use residuals as phenotype of interest in pedigree analysis.

Inference regarding covariates in heritability analysis H 0 : measured covariate Y does not influence phenotype. H 1 : measured covariate Y does influence phenotype.

Inference regarding covariates in heritability analysis H 0 : measured covariate Y does not influence phenotype. H 1 : measured covariate Y does influence phenotype. CAUTION: Related individuals in pedigrees are treated as unrelated. This can easily lead to false positive findings regarding the effect of the covariate!

Choice of covariates Covariates ought to be included in the likelihood model if they are known to influence the phenotype of interest and if their own genetic regulation does not overlap the genetic regulation of the target phenotype. Typical examples include sex and age. In the analysis of height, information on nutrition during childhood should probably be included during analysis. However, known growth hormone levels probably should not be.

Choice of covariates

Choice of covariates: special case of treatment/medication

Before treatment/medication of affected individuals unaffectedaffected

After (partially effective) treatment / medication of affected individuals unaffectedaffected apparent effect of covariate

Choice of covariates: special case of treatment/medication If medication is ineffective/partially effective, including treatment as a covariate is worse than ignoring it in the analysis. If medication is very effective, such that the phenotypic mean of individuals after treatment is equal to the phenotypic mean of the population as a whole, then including medication as a covariate has no effect. If medication is extremely effective, such that the phenotypic mean of individuals after treatment is “better” than the phenotypic mean of the population as a whole, then including medication as a covariate is better than ignoring it, but still far from satisfying. Either censor individuals or, better, infer or integrate over their phenotypes before treatment, based on information on efficacy etc.

Be careful in interpretation of heritability estimates While one can attempt to account for shared environmental factors individually or in aggregate, it is notoriously difficult to do so. In contrast to genetics where “co-exposure” among relatives is predictable due to inheritance rules, this is not the case with environmental factors of interest in epidemiology. If environmental co- exposure is not adequately modeled, shared environmental effects tend to inflate the heritability estimate, because shared exposure is generally greater among relatives, such as mimicking the effects of genetic similarity among relatives. Heritability estimates thus are often overestimates.

Be careful in interpretation of heritability estimates Keep in mind that heritability estimates are applicable only to a specific population at a specific point in time.

Heritability of adult height (additive heritability, adjusted for sex and age) studysample size heritability estimate TOPS FLS GAIT SAFHS SAFDS SHFS AZ DK OK Jiri total 7449

Be careful in interpretation of heritability estimates Heritability is a population level parameter, summarizing the strength of genetic influences on variation in a trait among members of the population. It does not provide any information regarding the phenotype in a given individual, such as risk of disease.

Relative risk The risk of disease (or another phenotype) in a relative of an affected individual as compared to the risk of disease in a randomly chosen person from the population.

Relative risk as a function of heritability

Heritability of adult height (additive heritability, adjusted for sex and age) phenotypep sib autism IDDM schizophrenia NIDDM obesity 0.4<2

Be careful in interpretation of heritability estimates A heritability estimate is applicable only to a specific trait. If you alter the trait in any way, such as inclusion of additional/different covariates, this may alter the estimate and/or alter the interpretation of the finding. Example: left ventricular mass not adjusted for blood pressure left ventricular mass adjusted for blood pressure